* [kvm-unit-tests PATCH v3 1/6] arm: pmu: pmu-chain-promotion: Improve debug messages
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
@ 2023-06-19 20:03 ` Eric Auger
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 2/6] arm: pmu: pmu-chain-promotion: Introduce defines for count and margin values Eric Auger
` (6 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-19 20:03 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
The pmu-chain-promotion test is composed of several subtests.
In case of failures, the current logs are really dificult to
analyze since they look very similar and sometimes duplicated
for each subtest. Add prefixes for each subtest and introduce
a macro that prints the registers we are mostly interested in,
namerly the 2 first counters and the overflow counter.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
v1 -> v2:
- Added Alexandru's R-b
- s/2d/2nd
---
arm/pmu.c | 63 ++++++++++++++++++++++++++++---------------------------
1 file changed, 32 insertions(+), 31 deletions(-)
diff --git a/arm/pmu.c b/arm/pmu.c
index f6e95012..cc2e733e 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -715,6 +715,11 @@ static void test_chained_sw_incr(bool unused)
report_info("overflow=0x%lx, #0=0x%lx #1=0x%lx", read_sysreg(pmovsclr_el0),
read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
}
+#define PRINT_REGS(__s) \
+ report_info("%s #1=0x%lx #0=0x%lx overflow=0x%lx", __s, \
+ read_regn_el0(pmevcntr, 1), \
+ read_regn_el0(pmevcntr, 0), \
+ read_sysreg(pmovsclr_el0))
static void test_chain_promotion(bool unused)
{
@@ -725,6 +730,7 @@ static void test_chain_promotion(bool unused)
return;
/* Only enable CHAIN counter */
+ report_prefix_push("subtest1");
pmu_reset();
write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
@@ -732,83 +738,81 @@ static void test_chain_promotion(bool unused)
isb();
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ PRINT_REGS("post");
report(!read_regn_el0(pmevcntr, 0),
"chain counter not counting if even counter is disabled");
+ report_prefix_pop();
/* Only enable even counter */
+ report_prefix_push("subtest2");
pmu_reset();
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
write_sysreg_s(0x1, PMCNTENSET_EL0);
isb();
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ PRINT_REGS("post");
report(!read_regn_el0(pmevcntr, 1) && (read_sysreg(pmovsclr_el0) == 0x1),
"odd counter did not increment on overflow if disabled");
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
- report_info("CHAIN counter #1 has value 0x%lx",
- read_regn_el0(pmevcntr, 1));
- report_info("overflow counter 0x%lx", read_sysreg(pmovsclr_el0));
+ report_prefix_pop();
/* start at 0xFFFFFFDC, +20 with CHAIN enabled, +20 with CHAIN disabled */
+ report_prefix_push("subtest3");
pmu_reset();
write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
isb();
+ PRINT_REGS("init");
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
+ PRINT_REGS("After 1st loop");
/* disable the CHAIN event */
write_sysreg_s(0x2, PMCNTENCLR_EL0);
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
+ PRINT_REGS("After 2nd loop");
report(read_sysreg(pmovsclr_el0) == 0x1,
"should have triggered an overflow on #0");
report(!read_regn_el0(pmevcntr, 1),
"CHAIN counter #1 shouldn't have incremented");
+ report_prefix_pop();
/* start at 0xFFFFFFDC, +20 with CHAIN disabled, +20 with CHAIN enabled */
+ report_prefix_push("subtest4");
pmu_reset();
write_sysreg_s(0x1, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
isb();
- report_info("counter #0 = 0x%lx, counter #1 = 0x%lx overflow=0x%lx",
- read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1),
- read_sysreg(pmovsclr_el0));
+ PRINT_REGS("init");
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
+ PRINT_REGS("After 1st loop");
/* enable the CHAIN event */
write_sysreg_s(0x3, PMCNTENSET_EL0);
isb();
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
+
+ PRINT_REGS("After 2nd loop");
report((read_regn_el0(pmevcntr, 1) == 1) &&
(read_sysreg(pmovsclr_el0) == 0x1),
"CHAIN counter enabled: CHAIN counter was incremented and overflow");
-
- report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
- read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
+ report_prefix_pop();
/* start as MEM_ACCESS/CPU_CYCLES and move to CHAIN/MEM_ACCESS */
+ report_prefix_push("subtest5");
pmu_reset();
write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevtyper, 1, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
isb();
+ PRINT_REGS("init");
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
+ PRINT_REGS("After 1st loop");
/* 0 becomes CHAINED */
write_sysreg_s(0x0, PMCNTENSET_EL0);
@@ -817,37 +821,34 @@ static void test_chain_promotion(bool unused)
write_regn_el0(pmevcntr, 1, 0x0);
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("MEM_ACCESS counter #0 has value 0x%lx",
- read_regn_el0(pmevcntr, 0));
+ PRINT_REGS("After 2nd loop");
report((read_regn_el0(pmevcntr, 1) == 1) &&
(read_sysreg(pmovsclr_el0) == 0x1),
"32b->64b: CHAIN counter incremented and overflow");
-
- report_info("CHAIN counter #1 = 0x%lx, overflow=0x%lx",
- read_regn_el0(pmevcntr, 1), read_sysreg(pmovsclr_el0));
+ report_prefix_pop();
/* start as CHAIN/MEM_ACCESS and move to MEM_ACCESS/CPU_CYCLES */
+ report_prefix_push("subtest6");
pmu_reset();
write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
write_sysreg_s(0x3, PMCNTENSET_EL0);
+ PRINT_REGS("init");
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
- report_info("counter #0=0x%lx, counter #1=0x%lx",
- read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1));
+ PRINT_REGS("After 1st loop");
write_sysreg_s(0x0, PMCNTENSET_EL0);
write_regn_el0(pmevtyper, 1, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
write_sysreg_s(0x3, PMCNTENSET_EL0);
mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ PRINT_REGS("After 2nd loop");
report(read_sysreg(pmovsclr_el0) == 1,
"overflow is expected on counter 0");
- report_info("counter #0=0x%lx, counter #1=0x%lx overflow=0x%lx",
- read_regn_el0(pmevcntr, 0), read_regn_el0(pmevcntr, 1),
- read_sysreg(pmovsclr_el0));
+ report_prefix_pop();
}
static bool expect_interrupts(uint32_t bitmap)
--
2.38.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [kvm-unit-tests PATCH v3 2/6] arm: pmu: pmu-chain-promotion: Introduce defines for count and margin values
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 1/6] arm: pmu: pmu-chain-promotion: Improve debug messages Eric Auger
@ 2023-06-19 20:03 ` Eric Auger
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 3/6] arm: pmu: Add extra DSB barriers in the mem_access loop Eric Auger
` (5 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-19 20:03 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
The pmu-chain-promotion test is composed of separate subtests.
Some of them apply some settings on a first MEM_ACCESS loop
iterations, change the settings and run another MEM_ACCESS loop.
The PRE_OVERFLOW2 MEM_ACCESS counter init value is defined so that
the first loop does not overflow and the second loop overflows.
At the moment the MEM_ACCESS count number is hardcoded to 20 and
PRE_OVERFLOW2 is set to UINT32_MAX - 20 - 15 where 15 acts as a
margin.
Introduce defines for the count number and the margin so that it
becomes easier to change them.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
v1 -> v2:
- 2d -> 2nd and use @COUNT in comment
- Added Alexandru's R-b
---
arm/pmu.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/arm/pmu.c b/arm/pmu.c
index cc2e733e..51c0fe80 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -55,11 +55,18 @@
#define EXT_COMMON_EVENTS_LOW 0x4000
#define EXT_COMMON_EVENTS_HIGH 0x403F
-#define ALL_SET_32 0x00000000FFFFFFFFULL
+#define ALL_SET_32 0x00000000FFFFFFFFULL
#define ALL_CLEAR 0x0000000000000000ULL
#define PRE_OVERFLOW_32 0x00000000FFFFFFF0ULL
-#define PRE_OVERFLOW2_32 0x00000000FFFFFFDCULL
#define PRE_OVERFLOW_64 0xFFFFFFFFFFFFFFF0ULL
+#define COUNT 20
+#define MARGIN 15
+/*
+ * PRE_OVERFLOW2 is set so that 1st @COUNT iterations do not
+ * produce 32b overflow and 2nd @COUNT iterations do. To accommodate
+ * for some observed variability we take into account a given @MARGIN
+ */
+#define PRE_OVERFLOW2_32 (ALL_SET_32 - COUNT - MARGIN)
#define PRE_OVERFLOW(__overflow_at_64bits) \
(__overflow_at_64bits ? PRE_OVERFLOW_64 : PRE_OVERFLOW_32)
@@ -737,7 +744,7 @@ static void test_chain_promotion(bool unused)
write_sysreg_s(0x2, PMCNTENSET_EL0);
isb();
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("post");
report(!read_regn_el0(pmevcntr, 0),
"chain counter not counting if even counter is disabled");
@@ -750,13 +757,13 @@ static void test_chain_promotion(bool unused)
write_sysreg_s(0x1, PMCNTENSET_EL0);
isb();
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("post");
report(!read_regn_el0(pmevcntr, 1) && (read_sysreg(pmovsclr_el0) == 0x1),
"odd counter did not increment on overflow if disabled");
report_prefix_pop();
- /* start at 0xFFFFFFDC, +20 with CHAIN enabled, +20 with CHAIN disabled */
+ /* 1st COUNT with CHAIN enabled, next COUNT with CHAIN disabled */
report_prefix_push("subtest3");
pmu_reset();
write_sysreg_s(0x3, PMCNTENSET_EL0);
@@ -764,12 +771,12 @@ static void test_chain_promotion(bool unused)
isb();
PRINT_REGS("init");
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
/* disable the CHAIN event */
write_sysreg_s(0x2, PMCNTENCLR_EL0);
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
report(read_sysreg(pmovsclr_el0) == 0x1,
"should have triggered an overflow on #0");
@@ -777,7 +784,7 @@ static void test_chain_promotion(bool unused)
"CHAIN counter #1 shouldn't have incremented");
report_prefix_pop();
- /* start at 0xFFFFFFDC, +20 with CHAIN disabled, +20 with CHAIN enabled */
+ /* 1st COUNT with CHAIN disabled, next COUNT with CHAIN enabled */
report_prefix_push("subtest4");
pmu_reset();
@@ -786,13 +793,13 @@ static void test_chain_promotion(bool unused)
isb();
PRINT_REGS("init");
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
/* enable the CHAIN event */
write_sysreg_s(0x3, PMCNTENSET_EL0);
isb();
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
@@ -811,7 +818,7 @@ static void test_chain_promotion(bool unused)
isb();
PRINT_REGS("init");
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
/* 0 becomes CHAINED */
@@ -820,7 +827,7 @@ static void test_chain_promotion(bool unused)
write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 1, 0x0);
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
report((read_regn_el0(pmevcntr, 1) == 1) &&
@@ -837,14 +844,14 @@ static void test_chain_promotion(bool unused)
write_sysreg_s(0x3, PMCNTENSET_EL0);
PRINT_REGS("init");
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
write_sysreg_s(0x0, PMCNTENSET_EL0);
write_regn_el0(pmevtyper, 1, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
write_sysreg_s(0x3, PMCNTENSET_EL0);
- mem_access_loop(addr, 20, pmu.pmcr_ro | PMU_PMCR_E);
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
report(read_sysreg(pmovsclr_el0) == 1,
"overflow is expected on counter 0");
--
2.38.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [kvm-unit-tests PATCH v3 3/6] arm: pmu: Add extra DSB barriers in the mem_access loop
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 1/6] arm: pmu: pmu-chain-promotion: Improve debug messages Eric Auger
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 2/6] arm: pmu: pmu-chain-promotion: Introduce defines for count and margin values Eric Auger
@ 2023-06-19 20:03 ` Eric Auger
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 4/6] arm: pmu: Fix chain counter enable/disable sequences Eric Auger
` (4 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-19 20:03 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
The mem access loop currently features ISB barriers only. However
the mem_access loop counts the number of accesses to memory. ISB
do not garantee the PE cannot reorder memory access. Let's
add a DSB ISH before the write to PMCR_EL0 that enables the PMU
to make sure any previous memory access aren't counted in the
loop, another one after the PMU gets enabled (to make sure loop
memory accesses cannot be reordered before the PMU gets enabled)
and a last one after the last iteration, before disabling the PMU.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
v2 -> v3:
- Added Alexandru's R-b
v1 -> v2:
- added yet another DSB after PMU enabled as suggested by Alexandru
This was discussed in https://lore.kernel.org/all/YzxmHpV2rpfaUdWi@monolith.localdoman/
---
arm/pmu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arm/pmu.c b/arm/pmu.c
index 51c0fe80..74dd4c10 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -301,13 +301,16 @@ static void mem_access_loop(void *addr, long loop, uint32_t pmcr)
{
uint64_t pmcr64 = pmcr;
asm volatile(
+ " dsb ish\n"
" msr pmcr_el0, %[pmcr]\n"
" isb\n"
+ " dsb ish\n"
" mov x10, %[loop]\n"
"1: sub x10, x10, #1\n"
" ldr x9, [%[addr]]\n"
" cmp x10, #0x0\n"
" b.gt 1b\n"
+ " dsb ish\n"
" msr pmcr_el0, xzr\n"
" isb\n"
:
--
2.38.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* [kvm-unit-tests PATCH v3 4/6] arm: pmu: Fix chain counter enable/disable sequences
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
` (2 preceding siblings ...)
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 3/6] arm: pmu: Add extra DSB barriers in the mem_access loop Eric Auger
@ 2023-06-19 20:03 ` Eric Auger
2023-06-30 15:47 ` Alexandru Elisei
2023-06-19 20:04 ` [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test Eric Auger
` (3 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2023-06-19 20:03 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
In some ARM ARM ddi0487 revisions it is said that
disabling/enabling a pair of counters that are paired
by a CHAIN event should follow a given sequence:
Enable the high counter first, isb, enable low counter
Disable the low counter first, isb, disable the high counter
This was the case in Fc. However this is not written anymore
in subsequent revions such as Ia.
Anyway, just in case, and because it also makes the code a
little bit simpler, introduce 2 helpers to enable/disable chain
counters that execute those sequences and replace the existing
PMCNTENCLR/ENSET calls (at least this cannot do any harm).
Also fix 2 write_sysreg_s(0x0, PMCNTENSET_EL0) in subtest 5 & 6
and replace them by PMCNTENCLR writes since writing 0 in
PMCNTENSET_EL0 has no effect. While at it, aslo removes a useless
pmevtyper setting in test_chained_sw_incr() since the type of the
counter is not reset by pmu_reset()
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
v2 -> v3:
- use enable_chain_counter() in test_chain_promotion() and
test_chained_sw_incr() as well. This also fix the missing
ISB reported by Alexandru.
- Also removes a useless pmevtyper setting in
test_chained_sw_incr()
v1 -> v2:
- fix the enable_chain_counter()/disable_chain_counter()
sequence, ie. swap n + 1 / n as reported by Alexandru.
- fix an other comment using the 'low' terminology
---
arm/pmu.c | 48 ++++++++++++++++++++++++++++++++----------------
1 file changed, 32 insertions(+), 16 deletions(-)
diff --git a/arm/pmu.c b/arm/pmu.c
index 74dd4c10..0995a249 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -631,6 +631,22 @@ static void test_sw_incr(bool overflow_at_64bits)
"overflow on counter #0 after 100 SW_INCR");
}
+static void enable_chain_counter(int even)
+{
+ write_sysreg_s(BIT(even + 1), PMCNTENSET_EL0); /* Enable the high counter first */
+ isb();
+ write_sysreg_s(BIT(even), PMCNTENSET_EL0); /* Enable the low counter */
+ isb();
+}
+
+static void disable_chain_counter(int even)
+{
+ write_sysreg_s(BIT(even), PMCNTENCLR_EL0); /* Disable the low counter first*/
+ isb();
+ write_sysreg_s(BIT(even + 1), PMCNTENCLR_EL0); /* Disable the high counter */
+ isb();
+}
+
static void test_chained_counters(bool unused)
{
uint32_t events[] = {CPU_CYCLES, CHAIN};
@@ -643,9 +659,8 @@ static void test_chained_counters(bool unused)
write_regn_el0(pmevtyper, 0, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
- /* enable counters #0 and #1 */
- write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
+ enable_chain_counter(0);
precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
@@ -655,10 +670,10 @@ static void test_chained_counters(bool unused)
/* test 64b overflow */
pmu_reset();
- write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
write_regn_el0(pmevcntr, 1, 0x1);
+ enable_chain_counter(0);
precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
report(read_regn_el0(pmevcntr, 1) == 2, "CHAIN counter #1 set to 2");
@@ -687,8 +702,7 @@ static void test_chained_sw_incr(bool unused)
write_regn_el0(pmevtyper, 0, SW_INCR | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
- /* enable counters #0 and #1 */
- write_sysreg_s(0x3, PMCNTENSET_EL0);
+ enable_chain_counter(0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
@@ -707,10 +721,9 @@ static void test_chained_sw_incr(bool unused)
/* 64b SW_INCR and overflow on CHAIN counter*/
pmu_reset();
- write_regn_el0(pmevtyper, 1, events[1] | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
write_regn_el0(pmevcntr, 1, ALL_SET_32);
- write_sysreg_s(0x3, PMCNTENSET_EL0);
+ enable_chain_counter(0);
set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
isb();
@@ -769,16 +782,17 @@ static void test_chain_promotion(bool unused)
/* 1st COUNT with CHAIN enabled, next COUNT with CHAIN disabled */
report_prefix_push("subtest3");
pmu_reset();
- write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
- isb();
+ enable_chain_counter(0);
PRINT_REGS("init");
mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
/* disable the CHAIN event */
- write_sysreg_s(0x2, PMCNTENCLR_EL0);
+ disable_chain_counter(0);
+ write_sysreg_s(0x1, PMCNTENSET_EL0); /* Enable the low counter */
+ isb();
mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
report(read_sysreg(pmovsclr_el0) == 0x1,
@@ -799,9 +813,11 @@ static void test_chain_promotion(bool unused)
mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
- /* enable the CHAIN event */
- write_sysreg_s(0x3, PMCNTENSET_EL0);
+ /* Disable the low counter first and enable the chain counter */
+ write_sysreg_s(0x1, PMCNTENCLR_EL0);
isb();
+ enable_chain_counter(0);
+
mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
@@ -825,10 +841,10 @@ static void test_chain_promotion(bool unused)
PRINT_REGS("After 1st loop");
/* 0 becomes CHAINED */
- write_sysreg_s(0x0, PMCNTENSET_EL0);
+ write_sysreg_s(0x3, PMCNTENCLR_EL0);
write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
- write_sysreg_s(0x3, PMCNTENSET_EL0);
write_regn_el0(pmevcntr, 1, 0x0);
+ enable_chain_counter(0);
mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 2nd loop");
@@ -844,13 +860,13 @@ static void test_chain_promotion(bool unused)
write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
- write_sysreg_s(0x3, PMCNTENSET_EL0);
+ enable_chain_counter(0);
PRINT_REGS("init");
mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
PRINT_REGS("After 1st loop");
- write_sysreg_s(0x0, PMCNTENSET_EL0);
+ disable_chain_counter(0);
write_regn_el0(pmevtyper, 1, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
write_sysreg_s(0x3, PMCNTENSET_EL0);
--
2.38.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [kvm-unit-tests PATCH v3 4/6] arm: pmu: Fix chain counter enable/disable sequences
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 4/6] arm: pmu: Fix chain counter enable/disable sequences Eric Auger
@ 2023-06-30 15:47 ` Alexandru Elisei
0 siblings, 0 replies; 16+ messages in thread
From: Alexandru Elisei @ 2023-06-30 15:47 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, mark.rutland
Hi,
On Mon, Jun 19, 2023 at 10:03:59PM +0200, Eric Auger wrote:
> In some ARM ARM ddi0487 revisions it is said that
> disabling/enabling a pair of counters that are paired
> by a CHAIN event should follow a given sequence:
>
> Enable the high counter first, isb, enable low counter
> Disable the low counter first, isb, disable the high counter
>
> This was the case in Fc. However this is not written anymore
> in subsequent revions such as Ia.
>
> Anyway, just in case, and because it also makes the code a
> little bit simpler, introduce 2 helpers to enable/disable chain
> counters that execute those sequences and replace the existing
> PMCNTENCLR/ENSET calls (at least this cannot do any harm).
>
> Also fix 2 write_sysreg_s(0x0, PMCNTENSET_EL0) in subtest 5 & 6
> and replace them by PMCNTENCLR writes since writing 0 in
> PMCNTENSET_EL0 has no effect. While at it, aslo removes a useless
> pmevtyper setting in test_chained_sw_incr() since the type of the
> counter is not reset by pmu_reset()
Looks good to me:
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Thanks,
Alex
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
> v2 -> v3:
> - use enable_chain_counter() in test_chain_promotion() and
> test_chained_sw_incr() as well. This also fix the missing
> ISB reported by Alexandru.
> - Also removes a useless pmevtyper setting in
> test_chained_sw_incr()
>
> v1 -> v2:
> - fix the enable_chain_counter()/disable_chain_counter()
> sequence, ie. swap n + 1 / n as reported by Alexandru.
> - fix an other comment using the 'low' terminology
> ---
> arm/pmu.c | 48 ++++++++++++++++++++++++++++++++----------------
> 1 file changed, 32 insertions(+), 16 deletions(-)
>
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 74dd4c10..0995a249 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -631,6 +631,22 @@ static void test_sw_incr(bool overflow_at_64bits)
> "overflow on counter #0 after 100 SW_INCR");
> }
>
> +static void enable_chain_counter(int even)
> +{
> + write_sysreg_s(BIT(even + 1), PMCNTENSET_EL0); /* Enable the high counter first */
> + isb();
> + write_sysreg_s(BIT(even), PMCNTENSET_EL0); /* Enable the low counter */
> + isb();
> +}
> +
> +static void disable_chain_counter(int even)
> +{
> + write_sysreg_s(BIT(even), PMCNTENCLR_EL0); /* Disable the low counter first*/
> + isb();
> + write_sysreg_s(BIT(even + 1), PMCNTENCLR_EL0); /* Disable the high counter */
> + isb();
> +}
> +
> static void test_chained_counters(bool unused)
> {
> uint32_t events[] = {CPU_CYCLES, CHAIN};
> @@ -643,9 +659,8 @@ static void test_chained_counters(bool unused)
>
> write_regn_el0(pmevtyper, 0, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
> write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
> - /* enable counters #0 and #1 */
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
> + enable_chain_counter(0);
>
> precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
>
> @@ -655,10 +670,10 @@ static void test_chained_counters(bool unused)
> /* test 64b overflow */
>
> pmu_reset();
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
>
> write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
> write_regn_el0(pmevcntr, 1, 0x1);
> + enable_chain_counter(0);
> precise_instrs_loop(22, pmu.pmcr_ro | PMU_PMCR_E);
> report_info("overflow reg = 0x%lx", read_sysreg(pmovsclr_el0));
> report(read_regn_el0(pmevcntr, 1) == 2, "CHAIN counter #1 set to 2");
> @@ -687,8 +702,7 @@ static void test_chained_sw_incr(bool unused)
>
> write_regn_el0(pmevtyper, 0, SW_INCR | PMEVTYPER_EXCLUDE_EL0);
> write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
> - /* enable counters #0 and #1 */
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> + enable_chain_counter(0);
>
> write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
> set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
> @@ -707,10 +721,9 @@ static void test_chained_sw_incr(bool unused)
> /* 64b SW_INCR and overflow on CHAIN counter*/
> pmu_reset();
>
> - write_regn_el0(pmevtyper, 1, events[1] | PMEVTYPER_EXCLUDE_EL0);
> write_regn_el0(pmevcntr, 0, PRE_OVERFLOW_32);
> write_regn_el0(pmevcntr, 1, ALL_SET_32);
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> + enable_chain_counter(0);
> set_pmcr(pmu.pmcr_ro | PMU_PMCR_E);
> isb();
>
> @@ -769,16 +782,17 @@ static void test_chain_promotion(bool unused)
> /* 1st COUNT with CHAIN enabled, next COUNT with CHAIN disabled */
> report_prefix_push("subtest3");
> pmu_reset();
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
> - isb();
> + enable_chain_counter(0);
> PRINT_REGS("init");
>
> mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
> PRINT_REGS("After 1st loop");
>
> /* disable the CHAIN event */
> - write_sysreg_s(0x2, PMCNTENCLR_EL0);
> + disable_chain_counter(0);
> + write_sysreg_s(0x1, PMCNTENSET_EL0); /* Enable the low counter */
> + isb();
> mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
> PRINT_REGS("After 2nd loop");
> report(read_sysreg(pmovsclr_el0) == 0x1,
> @@ -799,9 +813,11 @@ static void test_chain_promotion(bool unused)
> mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
> PRINT_REGS("After 1st loop");
>
> - /* enable the CHAIN event */
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> + /* Disable the low counter first and enable the chain counter */
> + write_sysreg_s(0x1, PMCNTENCLR_EL0);
> isb();
> + enable_chain_counter(0);
> +
> mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
>
> PRINT_REGS("After 2nd loop");
> @@ -825,10 +841,10 @@ static void test_chain_promotion(bool unused)
> PRINT_REGS("After 1st loop");
>
> /* 0 becomes CHAINED */
> - write_sysreg_s(0x0, PMCNTENSET_EL0);
> + write_sysreg_s(0x3, PMCNTENCLR_EL0);
> write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> write_regn_el0(pmevcntr, 1, 0x0);
> + enable_chain_counter(0);
>
> mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
> PRINT_REGS("After 2nd loop");
> @@ -844,13 +860,13 @@ static void test_chain_promotion(bool unused)
> write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
> write_regn_el0(pmevtyper, 1, CHAIN | PMEVTYPER_EXCLUDE_EL0);
> write_regn_el0(pmevcntr, 0, PRE_OVERFLOW2_32);
> - write_sysreg_s(0x3, PMCNTENSET_EL0);
> + enable_chain_counter(0);
> PRINT_REGS("init");
>
> mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E);
> PRINT_REGS("After 1st loop");
>
> - write_sysreg_s(0x0, PMCNTENSET_EL0);
> + disable_chain_counter(0);
> write_regn_el0(pmevtyper, 1, CPU_CYCLES | PMEVTYPER_EXCLUDE_EL0);
> write_sysreg_s(0x3, PMCNTENSET_EL0);
>
> --
> 2.38.1
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
` (3 preceding siblings ...)
2023-06-19 20:03 ` [kvm-unit-tests PATCH v3 4/6] arm: pmu: Fix chain counter enable/disable sequences Eric Auger
@ 2023-06-19 20:04 ` Eric Auger
2023-06-30 16:14 ` Alexandru Elisei
2023-07-01 11:36 ` Andrew Jones
2023-06-19 20:04 ` [kvm-unit-tests PATCH v3 6/6] arm: pmu-chain-promotion: Increase the count and margin values Eric Auger
` (2 subsequent siblings)
7 siblings, 2 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-19 20:04 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
Add a new basic test that runs MEM_ACCESS loop over
100 iterations and make sure the number of measured
MEM_ACCESS never overflows the margin. Some other
pmu tests rely on this pattern and if the MEM_ACCESS
measurement is not reliable, it is better to report
it beforehand and not confuse the user any further.
Without the subsequent patch, this typically fails on
ThunderXv2 with the following logs:
INFO: pmu: pmu-mem-access-reliability: 32-bit overflows:
overflow=1 min=21 max=41 COUNT=20 MARGIN=15
FAIL: pmu: pmu-mem-access-reliability: 32-bit overflows:
mem_access count is reliable
Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
v2 -> v3:
- rename variables as suggested by Alexandru, rework the
traces accordingly. Use all_set.
v1 -> v2:
- use mem-access instead of memaccess as suggested by Mark
- simplify the logic and add comments in the test loop
---
arm/pmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
arm/unittests.cfg | 6 +++++
2 files changed, 65 insertions(+)
diff --git a/arm/pmu.c b/arm/pmu.c
index 0995a249..491d2958 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -56,6 +56,11 @@
#define EXT_COMMON_EVENTS_HIGH 0x403F
#define ALL_SET_32 0x00000000FFFFFFFFULL
+#define ALL_SET_64 0xFFFFFFFFFFFFFFFFULL
+
+#define ALL_SET(__overflow_at_64bits) \
+ (__overflow_at_64bits ? ALL_SET_64 : ALL_SET_32)
+
#define ALL_CLEAR 0x0000000000000000ULL
#define PRE_OVERFLOW_32 0x00000000FFFFFFF0ULL
#define PRE_OVERFLOW_64 0xFFFFFFFFFFFFFFF0ULL
@@ -67,6 +72,10 @@
* for some observed variability we take into account a given @MARGIN
*/
#define PRE_OVERFLOW2_32 (ALL_SET_32 - COUNT - MARGIN)
+#define PRE_OVERFLOW2_64 (ALL_SET_64 - COUNT - MARGIN)
+
+#define PRE_OVERFLOW2(__overflow_at_64bits) \
+ (__overflow_at_64bits ? PRE_OVERFLOW2_64 : PRE_OVERFLOW2_32)
#define PRE_OVERFLOW(__overflow_at_64bits) \
(__overflow_at_64bits ? PRE_OVERFLOW_64 : PRE_OVERFLOW_32)
@@ -744,6 +753,53 @@ static void test_chained_sw_incr(bool unused)
read_regn_el0(pmevcntr, 0), \
read_sysreg(pmovsclr_el0))
+/*
+ * This test checks that a mem access loop featuring COUNT accesses
+ * does not overflow with an init value of PRE_OVERFLOW2. It also
+ * records the min/max access count to see how much the counting
+ * is (un)reliable
+ */
+static void test_mem_access_reliability(bool overflow_at_64bits)
+{
+ uint32_t events[] = {MEM_ACCESS};
+ void *addr = malloc(PAGE_SIZE);
+ uint64_t cntr_val, num_events, max = 0, min = pmevcntr_mask();
+ uint64_t pre_overflow2 = PRE_OVERFLOW2(overflow_at_64bits);
+ uint64_t all_set = ALL_SET(overflow_at_64bits);
+ uint64_t pmcr_lp = overflow_at_64bits ? PMU_PMCR_LP : 0;
+ bool overflow = false;
+
+ if (!satisfy_prerequisites(events, ARRAY_SIZE(events)) ||
+ !check_overflow_prerequisites(overflow_at_64bits))
+ return;
+
+ pmu_reset();
+ write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
+ for (int i = 0; i < 100; i++) {
+ pmu_reset();
+ write_regn_el0(pmevcntr, 0, pre_overflow2);
+ write_sysreg_s(0x1, PMCNTENSET_EL0);
+ isb();
+ mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
+ cntr_val = read_regn_el0(pmevcntr, 0);
+ if (cntr_val >= pre_overflow2) {
+ num_events = cntr_val - pre_overflow2;
+ } else {
+ /* unexpected counter overflow */
+ num_events = cntr_val + all_set - pre_overflow2;
+ overflow = true;
+ report_info("iter=%d num_events=%ld min=%ld max=%ld overflow!!!",
+ i, num_events, min, max);
+ }
+ /* record extreme value */
+ max = MAX(num_events, max);
+ min = MIN(num_events, min);
+ }
+ report_info("overflow=%d min=%ld max=%ld expected=%d acceptable margin=%d",
+ overflow, min, max, COUNT, MARGIN);
+ report(!overflow, "mem_access count is reliable");
+}
+
static void test_chain_promotion(bool unused)
{
uint32_t events[] = {MEM_ACCESS, CHAIN};
@@ -1201,6 +1257,9 @@ int main(int argc, char *argv[])
} else if (strcmp(argv[1], "pmu-basic-event-count") == 0) {
run_event_test(argv[1], test_basic_event_count, false);
run_event_test(argv[1], test_basic_event_count, true);
+ } else if (strcmp(argv[1], "pmu-mem-access-reliability") == 0) {
+ run_event_test(argv[1], test_mem_access_reliability, false);
+ run_event_test(argv[1], test_mem_access_reliability, true);
} else if (strcmp(argv[1], "pmu-mem-access") == 0) {
run_event_test(argv[1], test_mem_access, false);
run_event_test(argv[1], test_mem_access, true);
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 5e67b558..fe601cbb 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -90,6 +90,12 @@ groups = pmu
arch = arm64
extra_params = -append 'pmu-mem-access'
+[pmu-mem-access-reliability]
+file = pmu.flat
+groups = pmu
+arch = arm64
+extra_params = -append 'pmu-mem-access-reliability'
+
[pmu-sw-incr]
file = pmu.flat
groups = pmu
--
2.38.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test
2023-06-19 20:04 ` [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test Eric Auger
@ 2023-06-30 16:14 ` Alexandru Elisei
2023-06-30 17:15 ` Eric Auger
2023-07-01 11:36 ` Andrew Jones
1 sibling, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2023-06-30 16:14 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, mark.rutland
Hi,
On Mon, Jun 19, 2023 at 10:04:00PM +0200, Eric Auger wrote:
> Add a new basic test that runs MEM_ACCESS loop over
> 100 iterations and make sure the number of measured
> MEM_ACCESS never overflows the margin. Some other
> pmu tests rely on this pattern and if the MEM_ACCESS
> measurement is not reliable, it is better to report
> it beforehand and not confuse the user any further.
>
> Without the subsequent patch, this typically fails on
> ThunderXv2 with the following logs:
>
> INFO: pmu: pmu-mem-access-reliability: 32-bit overflows:
> overflow=1 min=21 max=41 COUNT=20 MARGIN=15
> FAIL: pmu: pmu-mem-access-reliability: 32-bit overflows:
> mem_access count is reliable
Looks good:
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Also ran the test on my rockpro64, with qemu and kvmtool. With kvmtool,
when running on the little cores, min=max=250, but when running on the big
cores, there was some deviation from that, I would say about 5 events,
varying from run to run. I assume this is because the little cores are in
order, and the big cores are out of order. Maybe the reason why you are
seeing such a big difference between min and max on the ThunderX 2, the
core might be speculating more aggressively.
Anyway:
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Thanks,
Alex
>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
> ---
> v2 -> v3:
> - rename variables as suggested by Alexandru, rework the
> traces accordingly. Use all_set.
>
> v1 -> v2:
> - use mem-access instead of memaccess as suggested by Mark
> - simplify the logic and add comments in the test loop
> ---
> arm/pmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
> arm/unittests.cfg | 6 +++++
> 2 files changed, 65 insertions(+)
>
> diff --git a/arm/pmu.c b/arm/pmu.c
> index 0995a249..491d2958 100644
> --- a/arm/pmu.c
> +++ b/arm/pmu.c
> @@ -56,6 +56,11 @@
> #define EXT_COMMON_EVENTS_HIGH 0x403F
>
> #define ALL_SET_32 0x00000000FFFFFFFFULL
> +#define ALL_SET_64 0xFFFFFFFFFFFFFFFFULL
> +
> +#define ALL_SET(__overflow_at_64bits) \
> + (__overflow_at_64bits ? ALL_SET_64 : ALL_SET_32)
> +
> #define ALL_CLEAR 0x0000000000000000ULL
> #define PRE_OVERFLOW_32 0x00000000FFFFFFF0ULL
> #define PRE_OVERFLOW_64 0xFFFFFFFFFFFFFFF0ULL
> @@ -67,6 +72,10 @@
> * for some observed variability we take into account a given @MARGIN
> */
> #define PRE_OVERFLOW2_32 (ALL_SET_32 - COUNT - MARGIN)
> +#define PRE_OVERFLOW2_64 (ALL_SET_64 - COUNT - MARGIN)
> +
> +#define PRE_OVERFLOW2(__overflow_at_64bits) \
> + (__overflow_at_64bits ? PRE_OVERFLOW2_64 : PRE_OVERFLOW2_32)
>
> #define PRE_OVERFLOW(__overflow_at_64bits) \
> (__overflow_at_64bits ? PRE_OVERFLOW_64 : PRE_OVERFLOW_32)
> @@ -744,6 +753,53 @@ static void test_chained_sw_incr(bool unused)
> read_regn_el0(pmevcntr, 0), \
> read_sysreg(pmovsclr_el0))
>
> +/*
> + * This test checks that a mem access loop featuring COUNT accesses
> + * does not overflow with an init value of PRE_OVERFLOW2. It also
> + * records the min/max access count to see how much the counting
> + * is (un)reliable
> + */
> +static void test_mem_access_reliability(bool overflow_at_64bits)
> +{
> + uint32_t events[] = {MEM_ACCESS};
> + void *addr = malloc(PAGE_SIZE);
> + uint64_t cntr_val, num_events, max = 0, min = pmevcntr_mask();
> + uint64_t pre_overflow2 = PRE_OVERFLOW2(overflow_at_64bits);
> + uint64_t all_set = ALL_SET(overflow_at_64bits);
> + uint64_t pmcr_lp = overflow_at_64bits ? PMU_PMCR_LP : 0;
> + bool overflow = false;
> +
> + if (!satisfy_prerequisites(events, ARRAY_SIZE(events)) ||
> + !check_overflow_prerequisites(overflow_at_64bits))
> + return;
> +
> + pmu_reset();
> + write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
> + for (int i = 0; i < 100; i++) {
> + pmu_reset();
> + write_regn_el0(pmevcntr, 0, pre_overflow2);
> + write_sysreg_s(0x1, PMCNTENSET_EL0);
> + isb();
> + mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
> + cntr_val = read_regn_el0(pmevcntr, 0);
> + if (cntr_val >= pre_overflow2) {
> + num_events = cntr_val - pre_overflow2;
> + } else {
> + /* unexpected counter overflow */
> + num_events = cntr_val + all_set - pre_overflow2;
> + overflow = true;
> + report_info("iter=%d num_events=%ld min=%ld max=%ld overflow!!!",
> + i, num_events, min, max);
> + }
> + /* record extreme value */
> + max = MAX(num_events, max);
> + min = MIN(num_events, min);
> + }
> + report_info("overflow=%d min=%ld max=%ld expected=%d acceptable margin=%d",
> + overflow, min, max, COUNT, MARGIN);
> + report(!overflow, "mem_access count is reliable");
> +}
> +
> static void test_chain_promotion(bool unused)
> {
> uint32_t events[] = {MEM_ACCESS, CHAIN};
> @@ -1201,6 +1257,9 @@ int main(int argc, char *argv[])
> } else if (strcmp(argv[1], "pmu-basic-event-count") == 0) {
> run_event_test(argv[1], test_basic_event_count, false);
> run_event_test(argv[1], test_basic_event_count, true);
> + } else if (strcmp(argv[1], "pmu-mem-access-reliability") == 0) {
> + run_event_test(argv[1], test_mem_access_reliability, false);
> + run_event_test(argv[1], test_mem_access_reliability, true);
> } else if (strcmp(argv[1], "pmu-mem-access") == 0) {
> run_event_test(argv[1], test_mem_access, false);
> run_event_test(argv[1], test_mem_access, true);
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 5e67b558..fe601cbb 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -90,6 +90,12 @@ groups = pmu
> arch = arm64
> extra_params = -append 'pmu-mem-access'
>
> +[pmu-mem-access-reliability]
> +file = pmu.flat
> +groups = pmu
> +arch = arm64
> +extra_params = -append 'pmu-mem-access-reliability'
> +
> [pmu-sw-incr]
> file = pmu.flat
> groups = pmu
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test
2023-06-30 16:14 ` Alexandru Elisei
@ 2023-06-30 17:15 ` Eric Auger
0 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-30 17:15 UTC (permalink / raw)
To: Alexandru Elisei
Cc: eric.auger.pro, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, mark.rutland
Hi Alexandru,
On 6/30/23 18:14, Alexandru Elisei wrote:
> Hi,
>
> On Mon, Jun 19, 2023 at 10:04:00PM +0200, Eric Auger wrote:
>> Add a new basic test that runs MEM_ACCESS loop over
>> 100 iterations and make sure the number of measured
>> MEM_ACCESS never overflows the margin. Some other
>> pmu tests rely on this pattern and if the MEM_ACCESS
>> measurement is not reliable, it is better to report
>> it beforehand and not confuse the user any further.
>>
>> Without the subsequent patch, this typically fails on
>> ThunderXv2 with the following logs:
>>
>> INFO: pmu: pmu-mem-access-reliability: 32-bit overflows:
>> overflow=1 min=21 max=41 COUNT=20 MARGIN=15
>> FAIL: pmu: pmu-mem-access-reliability: 32-bit overflows:
>> mem_access count is reliable
> Looks good:
>
> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
>
> Also ran the test on my rockpro64, with qemu and kvmtool. With kvmtool,
> when running on the little cores, min=max=250, but when running on the big
> cores, there was some deviation from that, I would say about 5 events,
> varying from run to run. I assume this is because the little cores are in
> order, and the big cores are out of order. Maybe the reason why you are
> seeing such a big difference between min and max on the ThunderX 2, the
> core might be speculating more aggressively.
thank you for your time. At least this new test outputting the min/max
will give interesting indications of what can be expected from the mem
access count precision on a given HW.
>
> Anyway:
>
> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Thanks!
Eric
>
> Thanks,
> Alex
>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>> v2 -> v3:
>> - rename variables as suggested by Alexandru, rework the
>> traces accordingly. Use all_set.
>>
>> v1 -> v2:
>> - use mem-access instead of memaccess as suggested by Mark
>> - simplify the logic and add comments in the test loop
>> ---
>> arm/pmu.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++
>> arm/unittests.cfg | 6 +++++
>> 2 files changed, 65 insertions(+)
>>
>> diff --git a/arm/pmu.c b/arm/pmu.c
>> index 0995a249..491d2958 100644
>> --- a/arm/pmu.c
>> +++ b/arm/pmu.c
>> @@ -56,6 +56,11 @@
>> #define EXT_COMMON_EVENTS_HIGH 0x403F
>>
>> #define ALL_SET_32 0x00000000FFFFFFFFULL
>> +#define ALL_SET_64 0xFFFFFFFFFFFFFFFFULL
>> +
>> +#define ALL_SET(__overflow_at_64bits) \
>> + (__overflow_at_64bits ? ALL_SET_64 : ALL_SET_32)
>> +
>> #define ALL_CLEAR 0x0000000000000000ULL
>> #define PRE_OVERFLOW_32 0x00000000FFFFFFF0ULL
>> #define PRE_OVERFLOW_64 0xFFFFFFFFFFFFFFF0ULL
>> @@ -67,6 +72,10 @@
>> * for some observed variability we take into account a given @MARGIN
>> */
>> #define PRE_OVERFLOW2_32 (ALL_SET_32 - COUNT - MARGIN)
>> +#define PRE_OVERFLOW2_64 (ALL_SET_64 - COUNT - MARGIN)
>> +
>> +#define PRE_OVERFLOW2(__overflow_at_64bits) \
>> + (__overflow_at_64bits ? PRE_OVERFLOW2_64 : PRE_OVERFLOW2_32)
>>
>> #define PRE_OVERFLOW(__overflow_at_64bits) \
>> (__overflow_at_64bits ? PRE_OVERFLOW_64 : PRE_OVERFLOW_32)
>> @@ -744,6 +753,53 @@ static void test_chained_sw_incr(bool unused)
>> read_regn_el0(pmevcntr, 0), \
>> read_sysreg(pmovsclr_el0))
>>
>> +/*
>> + * This test checks that a mem access loop featuring COUNT accesses
>> + * does not overflow with an init value of PRE_OVERFLOW2. It also
>> + * records the min/max access count to see how much the counting
>> + * is (un)reliable
>> + */
>> +static void test_mem_access_reliability(bool overflow_at_64bits)
>> +{
>> + uint32_t events[] = {MEM_ACCESS};
>> + void *addr = malloc(PAGE_SIZE);
>> + uint64_t cntr_val, num_events, max = 0, min = pmevcntr_mask();
>> + uint64_t pre_overflow2 = PRE_OVERFLOW2(overflow_at_64bits);
>> + uint64_t all_set = ALL_SET(overflow_at_64bits);
>> + uint64_t pmcr_lp = overflow_at_64bits ? PMU_PMCR_LP : 0;
>> + bool overflow = false;
>> +
>> + if (!satisfy_prerequisites(events, ARRAY_SIZE(events)) ||
>> + !check_overflow_prerequisites(overflow_at_64bits))
>> + return;
>> +
>> + pmu_reset();
>> + write_regn_el0(pmevtyper, 0, MEM_ACCESS | PMEVTYPER_EXCLUDE_EL0);
>> + for (int i = 0; i < 100; i++) {
>> + pmu_reset();
>> + write_regn_el0(pmevcntr, 0, pre_overflow2);
>> + write_sysreg_s(0x1, PMCNTENSET_EL0);
>> + isb();
>> + mem_access_loop(addr, COUNT, pmu.pmcr_ro | PMU_PMCR_E | pmcr_lp);
>> + cntr_val = read_regn_el0(pmevcntr, 0);
>> + if (cntr_val >= pre_overflow2) {
>> + num_events = cntr_val - pre_overflow2;
>> + } else {
>> + /* unexpected counter overflow */
>> + num_events = cntr_val + all_set - pre_overflow2;
>> + overflow = true;
>> + report_info("iter=%d num_events=%ld min=%ld max=%ld overflow!!!",
>> + i, num_events, min, max);
>> + }
>> + /* record extreme value */
>> + max = MAX(num_events, max);
>> + min = MIN(num_events, min);
>> + }
>> + report_info("overflow=%d min=%ld max=%ld expected=%d acceptable margin=%d",
>> + overflow, min, max, COUNT, MARGIN);
>> + report(!overflow, "mem_access count is reliable");
>> +}
>> +
>> static void test_chain_promotion(bool unused)
>> {
>> uint32_t events[] = {MEM_ACCESS, CHAIN};
>> @@ -1201,6 +1257,9 @@ int main(int argc, char *argv[])
>> } else if (strcmp(argv[1], "pmu-basic-event-count") == 0) {
>> run_event_test(argv[1], test_basic_event_count, false);
>> run_event_test(argv[1], test_basic_event_count, true);
>> + } else if (strcmp(argv[1], "pmu-mem-access-reliability") == 0) {
>> + run_event_test(argv[1], test_mem_access_reliability, false);
>> + run_event_test(argv[1], test_mem_access_reliability, true);
>> } else if (strcmp(argv[1], "pmu-mem-access") == 0) {
>> run_event_test(argv[1], test_mem_access, false);
>> run_event_test(argv[1], test_mem_access, true);
>> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
>> index 5e67b558..fe601cbb 100644
>> --- a/arm/unittests.cfg
>> +++ b/arm/unittests.cfg
>> @@ -90,6 +90,12 @@ groups = pmu
>> arch = arm64
>> extra_params = -append 'pmu-mem-access'
>>
>> +[pmu-mem-access-reliability]
>> +file = pmu.flat
>> +groups = pmu
>> +arch = arm64
>> +extra_params = -append 'pmu-mem-access-reliability'
>> +
>> [pmu-sw-incr]
>> file = pmu.flat
>> groups = pmu
>> --
>> 2.38.1
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test
2023-06-19 20:04 ` [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test Eric Auger
2023-06-30 16:14 ` Alexandru Elisei
@ 2023-07-01 11:36 ` Andrew Jones
2023-07-03 8:08 ` Eric Auger
1 sibling, 1 reply; 16+ messages in thread
From: Andrew Jones @ 2023-07-01 11:36 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, kvm, kvmarm, maz, will, oliver.upton, ricarkol,
reijiw, alexandru.elisei, mark.rutland
On Mon, Jun 19, 2023 at 10:04:00PM +0200, Eric Auger wrote:
...
> @@ -1201,6 +1257,9 @@ int main(int argc, char *argv[])
> } else if (strcmp(argv[1], "pmu-basic-event-count") == 0) {
> run_event_test(argv[1], test_basic_event_count, false);
> run_event_test(argv[1], test_basic_event_count, true);
> + } else if (strcmp(argv[1], "pmu-mem-access-reliability") == 0) {
> + run_event_test(argv[1], test_mem_access_reliability, false);
> + run_event_test(argv[1], test_mem_access_reliability, true);
This breaks compilation for arm since this patch is missing the stub.
I've added it.
Thanks,
drew
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test
2023-07-01 11:36 ` Andrew Jones
@ 2023-07-03 8:08 ` Eric Auger
0 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-07-03 8:08 UTC (permalink / raw)
To: Andrew Jones
Cc: eric.auger.pro, kvm, kvmarm, maz, will, oliver.upton, ricarkol,
reijiw, alexandru.elisei, mark.rutland
Hi Drew,
On 7/1/23 13:36, Andrew Jones wrote:
> On Mon, Jun 19, 2023 at 10:04:00PM +0200, Eric Auger wrote:
> ...
>> @@ -1201,6 +1257,9 @@ int main(int argc, char *argv[])
>> } else if (strcmp(argv[1], "pmu-basic-event-count") == 0) {
>> run_event_test(argv[1], test_basic_event_count, false);
>> run_event_test(argv[1], test_basic_event_count, true);
>> + } else if (strcmp(argv[1], "pmu-mem-access-reliability") == 0) {
>> + run_event_test(argv[1], test_mem_access_reliability, false);
>> + run_event_test(argv[1], test_mem_access_reliability, true);
> This breaks compilation for arm since this patch is missing the stub.
> I've added it.
sorry for the oversight and thanks for adding it ;-)
Eric
>
> Thanks,
> drew
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [kvm-unit-tests PATCH v3 6/6] arm: pmu-chain-promotion: Increase the count and margin values
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
` (4 preceding siblings ...)
2023-06-19 20:04 ` [kvm-unit-tests PATCH v3 5/6] arm: pmu: Add pmu-mem-access-reliability test Eric Auger
@ 2023-06-19 20:04 ` Eric Auger
2023-06-28 7:44 ` [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
2023-07-01 12:20 ` Andrew Jones
7 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-19 20:04 UTC (permalink / raw)
To: eric.auger.pro, eric.auger, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
Let's increase the mem_access loop count by defining COUNT=250
(instead of 20) and define a more reasonable margin (100 instead
of 15 previously) so that it gives better chance to accommodate
for HW implementation variance. Those values were chosen arbitrarily
higher. Those values fix the random failures on ThunderX2 machines.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
---
arm/pmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arm/pmu.c b/arm/pmu.c
index 491d2958..63822d19 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -64,8 +64,8 @@
#define ALL_CLEAR 0x0000000000000000ULL
#define PRE_OVERFLOW_32 0x00000000FFFFFFF0ULL
#define PRE_OVERFLOW_64 0xFFFFFFFFFFFFFFF0ULL
-#define COUNT 20
-#define MARGIN 15
+#define COUNT 250
+#define MARGIN 100
/*
* PRE_OVERFLOW2 is set so that 1st @COUNT iterations do not
* produce 32b overflow and 2nd @COUNT iterations do. To accommodate
--
2.38.1
^ permalink raw reply related [flat|nested] 16+ messages in thread* Re: [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
` (5 preceding siblings ...)
2023-06-19 20:04 ` [kvm-unit-tests PATCH v3 6/6] arm: pmu-chain-promotion: Increase the count and margin values Eric Auger
@ 2023-06-28 7:44 ` Eric Auger
2023-06-28 10:18 ` Alexandru Elisei
2023-07-01 12:20 ` Andrew Jones
7 siblings, 1 reply; 16+ messages in thread
From: Eric Auger @ 2023-06-28 7:44 UTC (permalink / raw)
To: eric.auger.pro, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, alexandru.elisei
Cc: mark.rutland
Hi Alexandru, Drew,
On 6/19/23 22:03, Eric Auger wrote:
> On some HW (ThunderXv2), some random failures of
> pmu-chain-promotion test can be observed.
>
> pmu-chain-promotion is composed of several subtests
> which run 2 mem_access loops. The initial value of
> the counter is set so that no overflow is expected on
> the first loop run and overflow is expected on the second.
> However it is observed that sometimes we get an overflow
> on the first run. It looks related to some variability of
> the mem_acess count. This variability is observed on all
> HW I have access to, with different span though. On
> ThunderX2 HW it looks the margin that is currently taken
> is too small and we regularly hit failure.
>
> although the first goal of this series is to increase
> the count/margin used in those tests, it also attempts
> to improve the pmu-chain-promotion logs, add some barriers
> in the mem-access loop, clarify the chain counter
> enable/disable sequence.
>
> A new 'pmu-mem-access-reliability' is also introduced to
> detect issues with MEM_ACCESS event variability and make
> the debug easier.
>
> Obviously one can wonder if this variability is something normal
> and does not hide any other bug. I hope this series will raise
> additional discussions about this.
>
> https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3
>
> History:
>
> v2 -> v3:
> - took into account Alexandru's comments. See individual log
> files
Gentle ping. Does this version match all your expectations?
Thanks
Eric
>
> v1 -> v2:
> - Take into account Alexandru's & Mark's comments. Added some
> R-b's and T-b's.
>
>
> Eric Auger (6):
> arm: pmu: pmu-chain-promotion: Improve debug messages
> arm: pmu: pmu-chain-promotion: Introduce defines for count and margin
> values
> arm: pmu: Add extra DSB barriers in the mem_access loop
> arm: pmu: Fix chain counter enable/disable sequences
> arm: pmu: Add pmu-mem-access-reliability test
> arm: pmu-chain-promotion: Increase the count and margin values
>
> arm/pmu.c | 208 ++++++++++++++++++++++++++++++++--------------
> arm/unittests.cfg | 6 ++
> 2 files changed, 153 insertions(+), 61 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion
2023-06-28 7:44 ` [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
@ 2023-06-28 10:18 ` Alexandru Elisei
2023-06-28 11:45 ` Eric Auger
0 siblings, 1 reply; 16+ messages in thread
From: Alexandru Elisei @ 2023-06-28 10:18 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, mark.rutland
Hi,
On Wed, Jun 28, 2023 at 09:44:44AM +0200, Eric Auger wrote:
> Hi Alexandru, Drew,
>
> On 6/19/23 22:03, Eric Auger wrote:
> > On some HW (ThunderXv2), some random failures of
> > pmu-chain-promotion test can be observed.
> >
> > pmu-chain-promotion is composed of several subtests
> > which run 2 mem_access loops. The initial value of
> > the counter is set so that no overflow is expected on
> > the first loop run and overflow is expected on the second.
> > However it is observed that sometimes we get an overflow
> > on the first run. It looks related to some variability of
> > the mem_acess count. This variability is observed on all
> > HW I have access to, with different span though. On
> > ThunderX2 HW it looks the margin that is currently taken
> > is too small and we regularly hit failure.
> >
> > although the first goal of this series is to increase
> > the count/margin used in those tests, it also attempts
> > to improve the pmu-chain-promotion logs, add some barriers
> > in the mem-access loop, clarify the chain counter
> > enable/disable sequence.
> >
> > A new 'pmu-mem-access-reliability' is also introduced to
> > detect issues with MEM_ACCESS event variability and make
> > the debug easier.
> >
> > Obviously one can wonder if this variability is something normal
> > and does not hide any other bug. I hope this series will raise
> > additional discussions about this.
> >
> > https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3
> >
> > History:
> >
> > v2 -> v3:
> > - took into account Alexandru's comments. See individual log
> > files
> Gentle ping. Does this version match all your expectations?
The series are on my radar, I'll have a look this Friday.
Thanks,
Alex
>
> Thanks
>
> Eric
> >
> > v1 -> v2:
> > - Take into account Alexandru's & Mark's comments. Added some
> > R-b's and T-b's.
> >
> >
> > Eric Auger (6):
> > arm: pmu: pmu-chain-promotion: Improve debug messages
> > arm: pmu: pmu-chain-promotion: Introduce defines for count and margin
> > values
> > arm: pmu: Add extra DSB barriers in the mem_access loop
> > arm: pmu: Fix chain counter enable/disable sequences
> > arm: pmu: Add pmu-mem-access-reliability test
> > arm: pmu-chain-promotion: Increase the count and margin values
> >
> > arm/pmu.c | 208 ++++++++++++++++++++++++++++++++--------------
> > arm/unittests.cfg | 6 ++
> > 2 files changed, 153 insertions(+), 61 deletions(-)
> >
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion
2023-06-28 10:18 ` Alexandru Elisei
@ 2023-06-28 11:45 ` Eric Auger
0 siblings, 0 replies; 16+ messages in thread
From: Eric Auger @ 2023-06-28 11:45 UTC (permalink / raw)
To: Alexandru Elisei
Cc: eric.auger.pro, kvm, kvmarm, andrew.jones, maz, will,
oliver.upton, ricarkol, reijiw, mark.rutland
Hi Alexandru,
On 6/28/23 12:18, Alexandru Elisei wrote:
> Hi,
>
> On Wed, Jun 28, 2023 at 09:44:44AM +0200, Eric Auger wrote:
>> Hi Alexandru, Drew,
>>
>> On 6/19/23 22:03, Eric Auger wrote:
>>> On some HW (ThunderXv2), some random failures of
>>> pmu-chain-promotion test can be observed.
>>>
>>> pmu-chain-promotion is composed of several subtests
>>> which run 2 mem_access loops. The initial value of
>>> the counter is set so that no overflow is expected on
>>> the first loop run and overflow is expected on the second.
>>> However it is observed that sometimes we get an overflow
>>> on the first run. It looks related to some variability of
>>> the mem_acess count. This variability is observed on all
>>> HW I have access to, with different span though. On
>>> ThunderX2 HW it looks the margin that is currently taken
>>> is too small and we regularly hit failure.
>>>
>>> although the first goal of this series is to increase
>>> the count/margin used in those tests, it also attempts
>>> to improve the pmu-chain-promotion logs, add some barriers
>>> in the mem-access loop, clarify the chain counter
>>> enable/disable sequence.
>>>
>>> A new 'pmu-mem-access-reliability' is also introduced to
>>> detect issues with MEM_ACCESS event variability and make
>>> the debug easier.
>>>
>>> Obviously one can wonder if this variability is something normal
>>> and does not hide any other bug. I hope this series will raise
>>> additional discussions about this.
>>>
>>> https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3
>>>
>>> History:
>>>
>>> v2 -> v3:
>>> - took into account Alexandru's comments. See individual log
>>> files
>> Gentle ping. Does this version match all your expectations?
> The series are on my radar, I'll have a look this Friday.
OK thanks :-)
Eric
>
> Thanks,
> Alex
>
>> Thanks
>>
>> Eric
>>> v1 -> v2:
>>> - Take into account Alexandru's & Mark's comments. Added some
>>> R-b's and T-b's.
>>>
>>>
>>> Eric Auger (6):
>>> arm: pmu: pmu-chain-promotion: Improve debug messages
>>> arm: pmu: pmu-chain-promotion: Introduce defines for count and margin
>>> values
>>> arm: pmu: Add extra DSB barriers in the mem_access loop
>>> arm: pmu: Fix chain counter enable/disable sequences
>>> arm: pmu: Add pmu-mem-access-reliability test
>>> arm: pmu-chain-promotion: Increase the count and margin values
>>>
>>> arm/pmu.c | 208 ++++++++++++++++++++++++++++++++--------------
>>> arm/unittests.cfg | 6 ++
>>> 2 files changed, 153 insertions(+), 61 deletions(-)
>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion
2023-06-19 20:03 [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
` (6 preceding siblings ...)
2023-06-28 7:44 ` [kvm-unit-tests PATCH v3 0/6] arm: pmu: Fix random failures of pmu-chain-promotion Eric Auger
@ 2023-07-01 12:20 ` Andrew Jones
7 siblings, 0 replies; 16+ messages in thread
From: Andrew Jones @ 2023-07-01 12:20 UTC (permalink / raw)
To: Eric Auger
Cc: eric.auger.pro, kvm, kvmarm, maz, will, oliver.upton, ricarkol,
reijiw, alexandru.elisei, mark.rutland
On Mon, Jun 19, 2023 at 10:03:55PM +0200, Eric Auger wrote:
> On some HW (ThunderXv2), some random failures of
> pmu-chain-promotion test can be observed.
>
> pmu-chain-promotion is composed of several subtests
> which run 2 mem_access loops. The initial value of
> the counter is set so that no overflow is expected on
> the first loop run and overflow is expected on the second.
> However it is observed that sometimes we get an overflow
> on the first run. It looks related to some variability of
> the mem_acess count. This variability is observed on all
> HW I have access to, with different span though. On
> ThunderX2 HW it looks the margin that is currently taken
> is too small and we regularly hit failure.
>
> although the first goal of this series is to increase
> the count/margin used in those tests, it also attempts
> to improve the pmu-chain-promotion logs, add some barriers
> in the mem-access loop, clarify the chain counter
> enable/disable sequence.
>
> A new 'pmu-mem-access-reliability' is also introduced to
> detect issues with MEM_ACCESS event variability and make
> the debug easier.
>
> Obviously one can wonder if this variability is something normal
> and does not hide any other bug. I hope this series will raise
> additional discussions about this.
>
> https://github.com/eauger/kut/tree/pmu-chain-promotion-fixes-v3
>
> History:
>
> v2 -> v3:
> - took into account Alexandru's comments. See individual log
> files
>
> v1 -> v2:
> - Take into account Alexandru's & Mark's comments. Added some
> R-b's and T-b's.
>
>
> Eric Auger (6):
> arm: pmu: pmu-chain-promotion: Improve debug messages
> arm: pmu: pmu-chain-promotion: Introduce defines for count and margin
> values
> arm: pmu: Add extra DSB barriers in the mem_access loop
> arm: pmu: Fix chain counter enable/disable sequences
> arm: pmu: Add pmu-mem-access-reliability test
> arm: pmu-chain-promotion: Increase the count and margin values
>
> arm/pmu.c | 208 ++++++++++++++++++++++++++++++++--------------
> arm/unittests.cfg | 6 ++
> 2 files changed, 153 insertions(+), 61 deletions(-)
>
> --
> 2.38.1
>
Merged. Thanks, Eric!
drew
^ permalink raw reply [flat|nested] 16+ messages in thread