linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v10 0/3] ARM PMU tests
@ 2016-11-21 20:24 Wei Huang
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 1/3] arm: Add PMU test Wei Huang
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Wei Huang @ 2016-11-21 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

Changes from v9:
* Move PMCCNTR related configuration from pmu_init() to sub-tests
* Change the name of loop test function to precise_cycles_loop()
* Print out error detail for each test case in check_cpi()
* Fix cpi convertion from argv
* Change the loop calculation in measure_instrs() after cpi is fixed

Note:
1) Current KVM code has bugs in handling PMCCFILTR write. A fix (see
below) is required for this unit testing code to work correctly under
KVM mode.
https://lists.cs.columbia.edu/pipermail/kvmarm/2016-November/022134.html.

Thanks,
-Wei

Christopher Covington (3):
  arm: Add PMU test
  arm: pmu: Check cycle count increases
  arm: pmu: Add CPI checking

 arm/Makefile.common |   3 +-
 arm/pmu.c           | 347 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  19 +++
 3 files changed, 368 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 1/3] arm: Add PMU test
  2016-11-21 20:24 [kvm-unit-tests PATCH v10 0/3] ARM PMU tests Wei Huang
@ 2016-11-21 20:24 ` Wei Huang
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases Wei Huang
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking Wei Huang
  2 siblings, 0 replies; 8+ messages in thread
From: Wei Huang @ 2016-11-21 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christopher Covington <cov@codeaurora.org>

Beginning with a simple sanity check of the control register, add
a unit test for the ARM Performance Monitors Unit (PMU).

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arm/Makefile.common |  3 ++-
 arm/pmu.c           | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 arm/unittests.cfg   |  5 ++++
 3 files changed, 81 insertions(+), 1 deletion(-)
 create mode 100644 arm/pmu.c

diff --git a/arm/Makefile.common b/arm/Makefile.common
index f37b5c2..5da2fdd 100644
--- a/arm/Makefile.common
+++ b/arm/Makefile.common
@@ -12,7 +12,8 @@ endif
 tests-common = \
 	$(TEST_DIR)/selftest.flat \
 	$(TEST_DIR)/spinlock-test.flat \
-	$(TEST_DIR)/pci-test.flat
+	$(TEST_DIR)/pci-test.flat \
+	$(TEST_DIR)/pmu.flat
 
 all: test_cases
 
diff --git a/arm/pmu.c b/arm/pmu.c
new file mode 100644
index 0000000..9d9c53b
--- /dev/null
+++ b/arm/pmu.c
@@ -0,0 +1,74 @@
+/*
+ * Test the ARM Performance Monitors Unit (PMU).
+ *
+ * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License version 2.1 and
+ * only version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ */
+#include "libcflat.h"
+#include "asm/barrier.h"
+
+#define PMU_PMCR_N_SHIFT   11
+#define PMU_PMCR_N_MASK    0x1f
+#define PMU_PMCR_ID_SHIFT  16
+#define PMU_PMCR_ID_MASK   0xff
+#define PMU_PMCR_IMP_SHIFT 24
+#define PMU_PMCR_IMP_MASK  0xff
+
+#if defined(__arm__)
+static inline uint32_t pmcr_read(void)
+{
+	uint32_t ret;
+
+	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
+	return ret;
+}
+#elif defined(__aarch64__)
+static inline uint32_t pmcr_read(void)
+{
+	uint32_t ret;
+
+	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
+	return ret;
+}
+#endif
+
+/*
+ * As a simple sanity check on the PMCR_EL0, ensure the implementer field isn't
+ * null. Also print out a couple other interesting fields for diagnostic
+ * purposes. For example, as of fall 2016, QEMU TCG mode doesn't implement
+ * event counters and therefore reports zero event counters, but hopefully
+ * support for at least the instructions event will be added in the future and
+ * the reported number of event counters will become nonzero.
+ */
+static bool check_pmcr(void)
+{
+	uint32_t pmcr;
+
+	pmcr = pmcr_read();
+
+	printf("PMU implementer:     %c\n",
+	       (pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK);
+	printf("Identification code: 0x%x\n",
+	       (pmcr >> PMU_PMCR_ID_SHIFT) & PMU_PMCR_ID_MASK);
+	printf("Event counters:      %d\n",
+	       (pmcr >> PMU_PMCR_N_SHIFT) & PMU_PMCR_N_MASK);
+
+	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
+}
+
+int main(void)
+{
+	report_prefix_push("pmu");
+
+	report("Control register", check_pmcr());
+
+	return report_summary();
+}
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ae32a42..816f494 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -58,3 +58,8 @@ groups = selftest
 [pci-test]
 file = pci-test.flat
 groups = pci
+
+# Test PMU support
+[pmu]
+file = pmu.flat
+groups = pmu
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases
  2016-11-21 20:24 [kvm-unit-tests PATCH v10 0/3] ARM PMU tests Wei Huang
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 1/3] arm: Add PMU test Wei Huang
@ 2016-11-21 20:24 ` Wei Huang
  2016-11-22 12:42   ` Andrew Jones
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking Wei Huang
  2 siblings, 1 reply; 8+ messages in thread
From: Wei Huang @ 2016-11-21 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christopher Covington <cov@codeaurora.org>

Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
even for the smallest delta of two subsequent reads.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/pmu.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 156 insertions(+)

diff --git a/arm/pmu.c b/arm/pmu.c
index 9d9c53b..176b070 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -15,6 +15,9 @@
 #include "libcflat.h"
 #include "asm/barrier.h"
 
+#define PMU_PMCR_E         (1 << 0)
+#define PMU_PMCR_C         (1 << 2)
+#define PMU_PMCR_LC        (1 << 6)
 #define PMU_PMCR_N_SHIFT   11
 #define PMU_PMCR_N_MASK    0x1f
 #define PMU_PMCR_ID_SHIFT  16
@@ -22,6 +25,14 @@
 #define PMU_PMCR_IMP_SHIFT 24
 #define PMU_PMCR_IMP_MASK  0xff
 
+#define ID_DFR0_PERFMON_SHIFT 24
+#define ID_DFR0_PERFMON_MASK  0xf
+
+#define PMU_CYCLE_IDX         31
+
+#define NR_SAMPLES 10
+
+static unsigned int pmu_version;
 #if defined(__arm__)
 static inline uint32_t pmcr_read(void)
 {
@@ -30,6 +41,69 @@ static inline uint32_t pmcr_read(void)
 	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
 	return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
+	isb();
+}
+
+static inline void pmselr_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
+	isb();
+}
+
+static inline void pmxevtyper_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+	uint32_t lo, hi = 0;
+
+	if (pmu_version == 0x3)
+		asm volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
+	else
+		asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (lo));
+
+	return ((uint64_t)hi << 32) | lo;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+	uint32_t lo, hi;
+
+	lo = value & 0xffffffff;
+	hi = (value >> 32) & 0xffffffff;
+
+	if (pmu_version == 0x3)
+		asm volatile("mcrr p15, 0, %0, %1, c9" : : "r" (lo), "r" (hi));
+	else
+		asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (lo));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
+}
+
+/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
+static inline void pmccfiltr_write(uint32_t value)
+{
+	pmselr_write(PMU_CYCLE_IDX);
+	pmxevtyper_write(value);
+	isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+	uint32_t val;
+
+	asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
+	return val;
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -38,6 +112,44 @@ static inline uint32_t pmcr_read(void)
 	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
 	return ret;
 }
+
+static inline void pmcr_write(uint32_t value)
+{
+	asm volatile("msr pmcr_el0, %0" : : "r" (value));
+	isb();
+}
+
+static inline uint64_t pmccntr_read(void)
+{
+	uint64_t cycles;
+
+	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
+	return cycles;
+}
+
+static inline void pmccntr_write(uint64_t value)
+{
+	asm volatile("msr pmccntr_el0, %0" : : "r" (value));
+}
+
+static inline void pmcntenset_write(uint32_t value)
+{
+	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
+}
+
+static inline void pmccfiltr_write(uint32_t value)
+{
+	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
+	isb();
+}
+
+static inline uint32_t id_dfr0_read(void)
+{
+	uint32_t id;
+
+	asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
+	return id;
+}
 #endif
 
 /*
@@ -64,11 +176,55 @@ static bool check_pmcr(void)
 	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
 }
 
+/*
+ * Ensure that the cycle counter progresses between back-to-back reads.
+ */
+static bool check_cycles_increase(void)
+{
+	bool success = true;
+
+	/* init before event access, this test only cares about cycle count */
+	pmcntenset_write(1 << PMU_CYCLE_IDX);
+	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+	pmccntr_write(0);
+
+	pmcr_write(pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
+
+	for (int i = 0; i < NR_SAMPLES; i++) {
+		uint64_t a, b;
+
+		a = pmccntr_read();
+		b = pmccntr_read();
+
+		if (a >= b) {
+			printf("Read %"PRId64" then %"PRId64".\n", a, b);
+			success = false;
+			break;
+		}
+	}
+
+	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
+
+	return success;
+}
+
+void pmu_init(void)
+{
+	uint32_t dfr0;
+
+	/* probe pmu version */
+	dfr0 = id_dfr0_read();
+	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
+	printf("PMU version: %d\n", pmu_version);
+}
+
 int main(void)
 {
 	report_prefix_push("pmu");
 
+	pmu_init();
 	report("Control register", check_pmcr());
+	report("Monotonically increasing cycle count", check_cycles_increase());
 
 	return report_summary();
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking
  2016-11-21 20:24 [kvm-unit-tests PATCH v10 0/3] ARM PMU tests Wei Huang
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 1/3] arm: Add PMU test Wei Huang
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases Wei Huang
@ 2016-11-21 20:24 ` Wei Huang
  2016-11-21 21:40   ` Christopher Covington
  2 siblings, 1 reply; 8+ messages in thread
From: Wei Huang @ 2016-11-21 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

From: Christopher Covington <cov@codeaurora.org>

Calculate the numbers of cycles per instruction (CPI) implied by ARM
PMU cycle counter values. The code includes a strict checking facility
intended for the -icount option in TCG mode in the configuration file.

Signed-off-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Wei Huang <wei@redhat.com>
---
 arm/pmu.c         | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 arm/unittests.cfg |  14 +++++++
 2 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/arm/pmu.c b/arm/pmu.c
index 176b070..129ef1e 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
 	asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
 	return val;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_cycles_loop(int loop, uint32_t pmcr)
+{
+	asm volatile(
+	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
+	"	isb\n"
+	"1:	subs	%[loop], %[loop], #1\n"
+	"	bgt	1b\n"
+	"	mcr	p15, 0, %[z], c9, c12, 0\n"
+	"	isb\n"
+	: [loop] "+r" (loop)
+	: [pmcr] "r" (pmcr), [z] "r" (0)
+	: "cc");
+}
 #elif defined(__aarch64__)
 static inline uint32_t pmcr_read(void)
 {
@@ -150,6 +169,25 @@ static inline uint32_t id_dfr0_read(void)
 	asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
 	return id;
 }
+
+/*
+ * Extra instructions inserted by the compiler would be difficult to compensate
+ * for, so hand assemble everything between, and including, the PMCR accesses
+ * to start and stop counting. Total cycles = isb + msr + 2*loop = 2 + 2*loop.
+ */
+static inline void precise_cycles_loop(int loop, uint32_t pmcr)
+{
+	asm volatile(
+	"	msr	pmcr_el0, %[pmcr]\n"
+	"	isb\n"
+	"1:	subs	%[loop], %[loop], #1\n"
+	"	b.gt	1b\n"
+	"	msr	pmcr_el0, xzr\n"
+	"	isb\n"
+	: [loop] "+r" (loop)
+	: [pmcr] "r" (pmcr)
+	: "cc");
+}
 #endif
 
 /*
@@ -208,6 +246,79 @@ static bool check_cycles_increase(void)
 	return success;
 }
 
+/*
+ * Execute a known number of guest instructions. Only odd instruction counts
+ * greater than or equal to 3 are supported by the in-line assembly code. The
+ * control register (PMCR_EL0) is initialized with the provided value (allowing
+ * for example for the cycle counter or event counters to be reset). At the end
+ * of the exact instruction loop, zero is written to PMCR_EL0 to disable
+ * counting, allowing the cycle counter or event counters to be read at the
+ * leisure of the calling code.
+ */
+static void measure_instrs(int num, uint32_t pmcr)
+{
+	int loop = (num - 2) / 2;
+
+	assert(num >= 4 && ((num - 2) % 2 == 0));
+	precise_cycles_loop(loop, pmcr);
+}
+
+/*
+ * Measure cycle counts for various known instruction counts. Ensure that the
+ * cycle counter progresses (similar to check_cycles_increase() but with more
+ * instructions and using reset and stop controls). If supplied a positive,
+ * nonzero CPI parameter, also strictly check that every measurement matches
+ * it. Strict CPI checking is used to test -icount mode.
+ */
+static bool check_cpi(int cpi)
+{
+	uint32_t pmcr = pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
+
+	/* init before event access, this test only cares about cycle count */
+	pmcntenset_write(1 << PMU_CYCLE_IDX);
+	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
+
+	if (cpi > 0)
+		printf("Checking for CPI=%d.\n", cpi);
+	printf("instrs : cycles0 cycles1 ...\n");
+
+	for (unsigned int i = 4; i < 300; i += 32) {
+		uint64_t avg, sum = 0;
+
+		printf("%d :", i);
+		for (int j = 0; j < NR_SAMPLES; j++) {
+			uint64_t cycles;
+
+			pmccntr_write(0);
+			measure_instrs(i, pmcr);
+			cycles = pmccntr_read();
+			printf(" %"PRId64"", cycles);
+
+			if (!cycles) {
+				printf("\ncycles not incrementing!\n");
+				return false;
+			} else if (cpi > 0 && cycles != i * cpi) {
+				printf("\nunexpected cycle count received!\n");
+				return false;
+			} else if ((cycles >> 32) != 0) {
+				/* The cycles taken by the loop above should
+				 * fit in 32 bits easily. We check the upper
+				 * 32 bits of the cycle counter to make sure
+				 * there is no supprise. */
+				printf("\ncycle count bigger than 32bit!\n");
+				return false;
+			}
+
+			sum += cycles;
+		}
+		avg = sum / NR_SAMPLES;
+		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
+		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
+	}
+
+	return true;
+}
+
 void pmu_init(void)
 {
 	uint32_t dfr0;
@@ -218,13 +329,19 @@ void pmu_init(void)
 	printf("PMU version: %d\n", pmu_version);
 }
 
-int main(void)
+int main(int argc, char *argv[])
 {
+	int cpi = 0;
+
+	if (argc > 1)
+		cpi = atol(argv[1]);
+
 	report_prefix_push("pmu");
 
 	pmu_init();
 	report("Control register", check_pmcr());
 	report("Monotonically increasing cycle count", check_cycles_increase());
+	report("Cycle/instruction ratio", check_cpi(cpi));
 
 	return report_summary();
 }
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 816f494..044d97c 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -63,3 +63,17 @@ groups = pci
 [pmu]
 file = pmu.flat
 groups = pmu
+
+# Test PMU support (TCG) with -icount IPC=1
+[pmu-tcg-icount-1]
+file = pmu.flat
+extra_params = -icount 0 -append '1'
+groups = pmu
+accel = tcg
+
+# Test PMU support (TCG) with -icount IPC=256
+[pmu-tcg-icount-256]
+file = pmu.flat
+extra_params = -icount 8 -append '256'
+groups = pmu
+accel = tcg
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking Wei Huang
@ 2016-11-21 21:40   ` Christopher Covington
  2016-11-21 22:49     ` Wei Huang
  0 siblings, 1 reply; 8+ messages in thread
From: Christopher Covington @ 2016-11-21 21:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On 11/21/2016 03:24 PM, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>

I really appreciate your work on these patches. If for any or all of these
you have more lines added/modified than me (or using any other better
metric), please make sure to change the author to be you with
`git commit --amend --reset-author` or equivalent.

> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> PMU cycle counter values. The code includes a strict checking facility
> intended for the -icount option in TCG mode in the configuration file.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>
> ---
>  arm/pmu.c         | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  arm/unittests.cfg |  14 +++++++
>  2 files changed, 132 insertions(+), 1 deletion(-)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 176b070..129ef1e 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
>  	asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
>  	return val;
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)

Nit: I would call this precise_instrs_loop. How many cycles it takes is
IMPLEMENTATION DEFINED.

> +{
> +	asm volatile(
> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	bgt	1b\n"

Is there any chance we might need an isb here, to prevent the stop from happening
before or during the loop? Where ISBs are required, the Linux best practice is to
diligently comment why they are needed. Perhaps it would be a good habit to
carry over into kvm-unit-tests.

> +	"	mcr	p15, 0, %[z], c9, c12, 0\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr), [z] "r" (0)
> +	: "cc");
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -150,6 +169,25 @@ static inline uint32_t id_dfr0_read(void)
>  	asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
>  	return id;
>  }
> +
> +/*
> + * Extra instructions inserted by the compiler would be difficult to compensate
> + * for, so hand assemble everything between, and including, the PMCR accesses
> + * to start and stop counting. Total cycles = isb + msr + 2*loop = 2 + 2*loop.
> + */
> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)
> +{
> +	asm volatile(
> +	"	msr	pmcr_el0, %[pmcr]\n"
> +	"	isb\n"
> +	"1:	subs	%[loop], %[loop], #1\n"
> +	"	b.gt	1b\n"
> +	"	msr	pmcr_el0, xzr\n"
> +	"	isb\n"
> +	: [loop] "+r" (loop)
> +	: [pmcr] "r" (pmcr)
> +	: "cc");
> +}
>  #endif
>  
>  /*
> @@ -208,6 +246,79 @@ static bool check_cycles_increase(void)
>  	return success;
>  }
>  
> +/*
> + * Execute a known number of guest instructions. Only odd instruction counts
> + * greater than or equal to 3 are supported by the in-line assembly code. The

Nit: needs updating as well (or removal if you prefer)

> + * control register (PMCR_EL0) is initialized with the provided value (allowing
> + * for example for the cycle counter or event counters to be reset). At the end
> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> + * counting, allowing the cycle counter or event counters to be read at the
> + * leisure of the calling code.
> + */
> +static void measure_instrs(int num, uint32_t pmcr)
> +{
> +	int loop = (num - 2) / 2;
> +
> +	assert(num >= 4 && ((num - 2) % 2 == 0));
> +	precise_cycles_loop(loop, pmcr);
> +}
> +
> +/*
> + * Measure cycle counts for various known instruction counts. Ensure that the
> + * cycle counter progresses (similar to check_cycles_increase() but with more
> + * instructions and using reset and stop controls). If supplied a positive,
> + * nonzero CPI parameter, also strictly check that every measurement matches
> + * it. Strict CPI checking is used to test -icount mode.
> + */
> +static bool check_cpi(int cpi)
> +{
> +	uint32_t pmcr = pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	pmcntenset_write(1 << PMU_CYCLE_IDX);
> +	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
> +
> +	if (cpi > 0)
> +		printf("Checking for CPI=%d.\n", cpi);
> +	printf("instrs : cycles0 cycles1 ...\n");
> +
> +	for (unsigned int i = 4; i < 300; i += 32) {
> +		uint64_t avg, sum = 0;
> +
> +		printf("%d :", i);
> +		for (int j = 0; j < NR_SAMPLES; j++) {
> +			uint64_t cycles;
> +
> +			pmccntr_write(0);
> +			measure_instrs(i, pmcr);
> +			cycles = pmccntr_read();
> +			printf(" %"PRId64"", cycles);
> +
> +			if (!cycles) {
> +				printf("\ncycles not incrementing!\n");
> +				return false;
> +			} else if (cpi > 0 && cycles != i * cpi) {
> +				printf("\nunexpected cycle count received!\n");
> +				return false;
> +			} else if ((cycles >> 32) != 0) {
> +				/* The cycles taken by the loop above should
> +				 * fit in 32 bits easily. We check the upper
> +				 * 32 bits of the cycle counter to make sure
> +				 * there is no supprise. */
> +				printf("\ncycle count bigger than 32bit!\n");
> +				return false;
> +			}
> +
> +			sum += cycles;
> +		}
> +		avg = sum / NR_SAMPLES;
> +		printf(" sum=%"PRId64" avg=%"PRId64" avg_ipc=%"PRId64" "
> +		       "avg_cpi=%"PRId64"\n", sum, avg, i / avg, avg / i);
> +	}
> +
> +	return true;
> +}
> +
>  void pmu_init(void)
>  {
>  	uint32_t dfr0;
> @@ -218,13 +329,19 @@ void pmu_init(void)
>  	printf("PMU version: %d\n", pmu_version);
>  }

This is clearly the right feature register to check. Sorry to have
gotten excited about an inferior method.

Cov

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking
  2016-11-21 21:40   ` Christopher Covington
@ 2016-11-21 22:49     ` Wei Huang
  2016-11-22 12:52       ` Andrew Jones
  0 siblings, 1 reply; 8+ messages in thread
From: Wei Huang @ 2016-11-21 22:49 UTC (permalink / raw)
  To: linux-arm-kernel



On 11/21/2016 03:40 PM, Christopher Covington wrote:
> Hi Wei,
> 
> On 11/21/2016 03:24 PM, Wei Huang wrote:
>> From: Christopher Covington <cov@codeaurora.org>
> 
> I really appreciate your work on these patches. If for any or all of these
> you have more lines added/modified than me (or using any other better
> metric), please make sure to change the author to be you with
> `git commit --amend --reset-author` or equivalent.

Sure, I will if needed. Regarding your comments below, I will fix the
patch series after Drew's comments, if any.

> 
>> Calculate the numbers of cycles per instruction (CPI) implied by ARM
>> PMU cycle counter values. The code includes a strict checking facility
>> intended for the -icount option in TCG mode in the configuration file.
>>
>> Signed-off-by: Christopher Covington <cov@codeaurora.org>
>> Signed-off-by: Wei Huang <wei@redhat.com>
>> ---
>>  arm/pmu.c         | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  arm/unittests.cfg |  14 +++++++
>>  2 files changed, 132 insertions(+), 1 deletion(-)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 176b070..129ef1e 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
>>  	asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
>>  	return val;
>>  }
>> +
>> +/*
>> + * Extra instructions inserted by the compiler would be difficult to compensate
>> + * for, so hand assemble everything between, and including, the PMCR accesses
>> + * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
                                   ^^^^^^^^^^^^
I will change the comment above to "Total instrs".

>> + */
>> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)
> 
> Nit: I would call this precise_instrs_loop. How many cycles it takes is
> IMPLEMENTATION DEFINED.

You are right. The cycle indeed depends on the design. Will fix.

> 
>> +{
>> +	asm volatile(
>> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
>> +	"	isb\n"
>> +	"1:	subs	%[loop], %[loop], #1\n"
>> +	"	bgt	1b\n"
> 
> Is there any chance we might need an isb here, to prevent the stop from happening
> before or during the loop? Where ISBs are required, the Linux best practice is to

In theory, I think this can happen when mcr is executed before all loop
instructions completed, causing pmccntr_read() to miss some cycles. But
QEMU TCG mode doesn't support out-order-execution. So the test
condition, "cpi > 0 && cycles != i * cpi", will never be TRUE. Because
cpi==0 in KVM, this same test condition won't be TRUE under KVM mode either.

> diligently comment why they are needed. Perhaps it would be a good habit to
> carry over into kvm-unit-tests.

Agreed. Most isb() instructions were added following CP15 writes (not
all CP15 writes, but at limited locations). We tried to follow what
Linux kernel does in perf_event.c. If you feel that any isb() place
needs special comment, I will be more than happy to add it.

<snip>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases
  2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases Wei Huang
@ 2016-11-22 12:42   ` Andrew Jones
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Jones @ 2016-11-22 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 21, 2016 at 02:24:54PM -0600, Wei Huang wrote:
> From: Christopher Covington <cov@codeaurora.org>
> 
> Ensure that reads of the PMCCNTR_EL0 are monotonically increasing,
> even for the smallest delta of two subsequent reads.
> 
> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> Signed-off-by: Wei Huang <wei@redhat.com>

Reviewed-by: Andrew Jones <drjones@redhat.com>

> ---
>  arm/pmu.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 156 insertions(+)
> 
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 9d9c53b..176b070 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -15,6 +15,9 @@
>  #include "libcflat.h"
>  #include "asm/barrier.h"
>  
> +#define PMU_PMCR_E         (1 << 0)
> +#define PMU_PMCR_C         (1 << 2)
> +#define PMU_PMCR_LC        (1 << 6)
>  #define PMU_PMCR_N_SHIFT   11
>  #define PMU_PMCR_N_MASK    0x1f
>  #define PMU_PMCR_ID_SHIFT  16
> @@ -22,6 +25,14 @@
>  #define PMU_PMCR_IMP_SHIFT 24
>  #define PMU_PMCR_IMP_MASK  0xff
>  
> +#define ID_DFR0_PERFMON_SHIFT 24
> +#define ID_DFR0_PERFMON_MASK  0xf
> +
> +#define PMU_CYCLE_IDX         31
> +
> +#define NR_SAMPLES 10
> +
> +static unsigned int pmu_version;
>  #if defined(__arm__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -30,6 +41,69 @@ static inline uint32_t pmcr_read(void)
>  	asm volatile("mrc p15, 0, %0, c9, c12, 0" : "=r" (ret));
>  	return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 0" : : "r" (value));
> +	isb();
> +}
> +
> +static inline void pmselr_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (value));
> +	isb();
> +}
> +
> +static inline void pmxevtyper_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c13, 1" : : "r" (value));
> +}
> +
> +static inline uint64_t pmccntr_read(void)
> +{
> +	uint32_t lo, hi = 0;
> +
> +	if (pmu_version == 0x3)
> +		asm volatile("mrrc p15, 0, %0, %1, c9" : "=r" (lo), "=r" (hi));
> +	else
> +		asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (lo));
> +
> +	return ((uint64_t)hi << 32) | lo;
> +}
> +
> +static inline void pmccntr_write(uint64_t value)
> +{
> +	uint32_t lo, hi;
> +
> +	lo = value & 0xffffffff;
> +	hi = (value >> 32) & 0xffffffff;
> +
> +	if (pmu_version == 0x3)
> +		asm volatile("mcrr p15, 0, %0, %1, c9" : : "r" (lo), "r" (hi));
> +	else
> +		asm volatile("mcr p15, 0, %0, c9, c13, 0" : : "r" (lo));
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> +	asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (value));
> +}
> +
> +/* PMCCFILTR is an obsolete name for PMXEVTYPER31 in ARMv7 */
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> +	pmselr_write(PMU_CYCLE_IDX);
> +	pmxevtyper_write(value);
> +	isb();
> +}
> +
> +static inline uint32_t id_dfr0_read(void)
> +{
> +	uint32_t val;
> +
> +	asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
> +	return val;
> +}
>  #elif defined(__aarch64__)
>  static inline uint32_t pmcr_read(void)
>  {
> @@ -38,6 +112,44 @@ static inline uint32_t pmcr_read(void)
>  	asm volatile("mrs %0, pmcr_el0" : "=r" (ret));
>  	return ret;
>  }
> +
> +static inline void pmcr_write(uint32_t value)
> +{
> +	asm volatile("msr pmcr_el0, %0" : : "r" (value));
> +	isb();
> +}
> +
> +static inline uint64_t pmccntr_read(void)
> +{
> +	uint64_t cycles;
> +
> +	asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> +	return cycles;
> +}
> +
> +static inline void pmccntr_write(uint64_t value)
> +{
> +	asm volatile("msr pmccntr_el0, %0" : : "r" (value));
> +}
> +
> +static inline void pmcntenset_write(uint32_t value)
> +{
> +	asm volatile("msr pmcntenset_el0, %0" : : "r" (value));
> +}
> +
> +static inline void pmccfiltr_write(uint32_t value)
> +{
> +	asm volatile("msr pmccfiltr_el0, %0" : : "r" (value));
> +	isb();
> +}
> +
> +static inline uint32_t id_dfr0_read(void)
> +{
> +	uint32_t id;
> +
> +	asm volatile("mrs %0, id_dfr0_el1" : "=r" (id));
> +	return id;
> +}
>  #endif
>  
>  /*
> @@ -64,11 +176,55 @@ static bool check_pmcr(void)
>  	return ((pmcr >> PMU_PMCR_IMP_SHIFT) & PMU_PMCR_IMP_MASK) != 0;
>  }
>  
> +/*
> + * Ensure that the cycle counter progresses between back-to-back reads.
> + */
> +static bool check_cycles_increase(void)
> +{
> +	bool success = true;
> +
> +	/* init before event access, this test only cares about cycle count */
> +	pmcntenset_write(1 << PMU_CYCLE_IDX);
> +	pmccfiltr_write(0); /* count cycles in EL0, EL1, but not EL2 */
> +	pmccntr_write(0);
> +
> +	pmcr_write(pmcr_read() | PMU_PMCR_LC | PMU_PMCR_C | PMU_PMCR_E);
> +
> +	for (int i = 0; i < NR_SAMPLES; i++) {
> +		uint64_t a, b;
> +
> +		a = pmccntr_read();
> +		b = pmccntr_read();
> +
> +		if (a >= b) {
> +			printf("Read %"PRId64" then %"PRId64".\n", a, b);
> +			success = false;
> +			break;
> +		}
> +	}
> +
> +	pmcr_write(pmcr_read() & ~PMU_PMCR_E);
> +
> +	return success;
> +}
> +
> +void pmu_init(void)
> +{
> +	uint32_t dfr0;
> +
> +	/* probe pmu version */
> +	dfr0 = id_dfr0_read();
> +	pmu_version = (dfr0 >> ID_DFR0_PERFMON_SHIFT) & ID_DFR0_PERFMON_MASK;
> +	printf("PMU version: %d\n", pmu_version);
> +}
> +
>  int main(void)
>  {
>  	report_prefix_push("pmu");
>  
> +	pmu_init();
>  	report("Control register", check_pmcr());
> +	report("Monotonically increasing cycle count", check_cycles_increase());
>  
>  	return report_summary();
>  }
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking
  2016-11-21 22:49     ` Wei Huang
@ 2016-11-22 12:52       ` Andrew Jones
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Jones @ 2016-11-22 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 21, 2016 at 04:49:20PM -0600, Wei Huang wrote:
> 
> 
> On 11/21/2016 03:40 PM, Christopher Covington wrote:
> > Hi Wei,
> > 
> > On 11/21/2016 03:24 PM, Wei Huang wrote:
> >> From: Christopher Covington <cov@codeaurora.org>
> > 
> > I really appreciate your work on these patches. If for any or all of these
> > you have more lines added/modified than me (or using any other better
> > metric), please make sure to change the author to be you with
> > `git commit --amend --reset-author` or equivalent.
> 
> Sure, I will if needed. Regarding your comments below, I will fix the
> patch series after Drew's comments, if any.
> 
> > 
> >> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> >> PMU cycle counter values. The code includes a strict checking facility
> >> intended for the -icount option in TCG mode in the configuration file.
> >>
> >> Signed-off-by: Christopher Covington <cov@codeaurora.org>
> >> Signed-off-by: Wei Huang <wei@redhat.com>
> >> ---
> >>  arm/pmu.c         | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >>  arm/unittests.cfg |  14 +++++++
> >>  2 files changed, 132 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arm/pmu.c b/arm/pmu.c
> >> index 176b070..129ef1e 100644
> >> --- a/arm/pmu.c
> >> +++ b/arm/pmu.c
> >> @@ -104,6 +104,25 @@ static inline uint32_t id_dfr0_read(void)
> >>  	asm volatile("mrc p15, 0, %0, c0, c1, 2" : "=r" (val));
> >>  	return val;
> >>  }
> >> +
> >> +/*
> >> + * Extra instructions inserted by the compiler would be difficult to compensate
> >> + * for, so hand assemble everything between, and including, the PMCR accesses
> >> + * to start and stop counting. Total cycles = isb + mcr + 2*loop = 2 + 2*loop.
>                                    ^^^^^^^^^^^^
> I will change the comment above to "Total instrs".
> 
> >> + */
> >> +static inline void precise_cycles_loop(int loop, uint32_t pmcr)
> > 
> > Nit: I would call this precise_instrs_loop. How many cycles it takes is
> > IMPLEMENTATION DEFINED.
> 
> You are right. The cycle indeed depends on the design. Will fix.
> 
> > 
> >> +{
> >> +	asm volatile(
> >> +	"	mcr	p15, 0, %[pmcr], c9, c12, 0\n"
> >> +	"	isb\n"
> >> +	"1:	subs	%[loop], %[loop], #1\n"
> >> +	"	bgt	1b\n"
> > 
> > Is there any chance we might need an isb here, to prevent the stop from happening
> > before or during the loop? Where ISBs are required, the Linux best practice is to
> 
> In theory, I think this can happen when mcr is executed before all loop
> instructions completed, causing pmccntr_read() to miss some cycles. But
> QEMU TCG mode doesn't support out-order-execution. So the test
> condition, "cpi > 0 && cycles != i * cpi", will never be TRUE. Because
> cpi==0 in KVM, this same test condition won't be TRUE under KVM mode either.
> 
> > diligently comment why they are needed. Perhaps it would be a good habit to
> > carry over into kvm-unit-tests.
> 
> Agreed. Most isb() instructions were added following CP15 writes (not
> all CP15 writes, but at limited locations). We tried to follow what
> Linux kernel does in perf_event.c. If you feel that any isb() place
> needs special comment, I will be more than happy to add it.
> 
> <snip>

No new comments from me. Thanks guys for catching the need to update the
comments.

drew

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-11-22 12:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-21 20:24 [kvm-unit-tests PATCH v10 0/3] ARM PMU tests Wei Huang
2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 1/3] arm: Add PMU test Wei Huang
2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 2/3] arm: pmu: Check cycle count increases Wei Huang
2016-11-22 12:42   ` Andrew Jones
2016-11-21 20:24 ` [kvm-unit-tests PATCH v10 3/3] arm: pmu: Add CPI checking Wei Huang
2016-11-21 21:40   ` Christopher Covington
2016-11-21 22:49     ` Wei Huang
2016-11-22 12:52       ` Andrew Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).