* [kvm-unit-tests PATCH 1/2] x86/apic: separate reporting from actual measurements
2026-04-28 13:35 [kvm-unit-tests PATCH 0/2] x86/apic: fix false test_apic_change_mode failures on stalled vCPUs Igor Mammedov
@ 2026-04-28 13:35 ` Igor Mammedov
2026-04-28 13:35 ` [kvm-unit-tests PATCH 2/2] x86/apic: add retry logic to test_apic_change_mode Igor Mammedov
1 sibling, 0 replies; 3+ messages in thread
From: Igor Mammedov @ 2026-04-28 13:35 UTC (permalink / raw)
To: kvm; +Cc: pbonzini
Move report() calls to the end of the function so that serial I/O
does not interfere with timer measurements.
While at it, wrap the test body in a for loop as preparation
for adding retry logic to deal with failures due to large
enough vCPU stalls.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
x86/apic.c | 109 +++++++++++++++++++++++++++++++----------------------
1 file changed, 63 insertions(+), 46 deletions(-)
diff --git a/x86/apic.c b/x86/apic.c
index 0a52e9a4..d4eb8e11 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -573,62 +573,79 @@ static inline void apic_change_mode(unsigned long new_mode)
static void test_apic_change_mode(void)
{
- uint32_t tmict = 0x999999;
+ const uint32_t tmict_values[] = {
+ 0x999999, /* ~10ms */
+ };
+ int retry, max_retries = ARRAY_SIZE(tmict_values);
+ uint32_t tmict;
+ bool tmict_reset = false, o_nonzero = false, o_reached_zero = false;
+ bool p_nonzero = false, p_not_reset = false, p_after_wrap = false;
+ bool p2o_not_reset = false, p2o_reached_zero = false, p2o_stay_zero = false;
printf("starting apic change mode\n");
- apic_write(APIC_TMICT, tmict);
+ for (retry = 0; retry < max_retries; retry++) {
+ tmict = tmict_values[retry];
+ apic_write(APIC_TMICT, tmict);
+ apic_change_mode(APIC_LVT_TIMER_PERIODIC);
+ tmict_reset = apic_read(APIC_TMICT) == tmict;
- apic_change_mode(APIC_LVT_TIMER_PERIODIC);
+ /* Testing one-shot */
+ apic_change_mode(APIC_LVT_TIMER_ONESHOT);
+ apic_write(APIC_TMICT, tmict);
+ o_nonzero = apic_read(APIC_TMCCT);
- report(apic_read(APIC_TMICT) == tmict, "TMICT value reset");
+ wait_until_tmcct_is_zero(tmict, false);
+ o_reached_zero = !apic_read(APIC_TMCCT);
- /* Testing one-shot */
- apic_change_mode(APIC_LVT_TIMER_ONESHOT);
- apic_write(APIC_TMICT, tmict);
- report(apic_read(APIC_TMCCT), "TMCCT should have a non-zero value");
+ /*
+ * Write TMICT before changing mode from one-shot to periodic
+ * TMCCT should be reset to TMICT periodicly
+ */
+ apic_write(APIC_TMICT, tmict);
+ wait_until_tmcct_is_zero(tmict, true);
+ apic_change_mode(APIC_LVT_TIMER_PERIODIC);
+ p_nonzero = apic_read(APIC_TMCCT);
- wait_until_tmcct_is_zero(tmict, false);
- report(!apic_read(APIC_TMCCT), "TMCCT should have reached 0");
+ /*
+ * After the change of mode, the counter should not be reset
+ * and continue counting down from where it was
+ */
+ p_not_reset = apic_read(APIC_TMCCT) < (tmict / 2);
+ /*
+ * Specifically wait for timer wrap around and skip 0.
+ * Under KVM lapic there is a possibility that a small amount
+ * of consecutive TMCCR reads return 0 while hrtimer is reset
+ * in an async callback
+ */
+ wait_until_tmcct_wrap_around(tmict, false);
+ p_after_wrap = apic_read(APIC_TMCCT) > (tmict / 2);
- /*
- * Write TMICT before changing mode from one-shot to periodic TMCCT should
- * be reset to TMICT periodicly
- */
- apic_write(APIC_TMICT, tmict);
- wait_until_tmcct_is_zero(tmict, true);
- apic_change_mode(APIC_LVT_TIMER_PERIODIC);
- report(apic_read(APIC_TMCCT), "TMCCT should have a non-zero value");
+ wait_until_tmcct_is_zero(tmict, true);
+ /*
+ * Keep the same TMICT and change timer mode to one-shot
+ * TMCCT should be > 0 and count-down to 0
+ */
+ apic_change_mode(APIC_LVT_TIMER_ONESHOT);
+ p2o_not_reset = apic_read(APIC_TMCCT) < (tmict / 2);
- /*
- * After the change of mode, the counter should not be reset and continue
- * counting down from where it was
- */
- report(apic_read(APIC_TMCCT) < (tmict / 2),
- "TMCCT should not be reset to TMICT value");
- /*
- * Specifically wait for timer wrap around and skip 0.
- * Under KVM lapic there is a possibility that a small amount of consecutive
- * TMCCR reads return 0 while hrtimer is reset in an async callback
- */
- wait_until_tmcct_wrap_around(tmict, false);
- report(apic_read(APIC_TMCCT) > (tmict / 2),
- "TMCCT should be reset to the initial-count");
+ wait_until_tmcct_is_zero(tmict, false);
+ p2o_reached_zero = !apic_read(APIC_TMCCT);
- wait_until_tmcct_is_zero(tmict, true);
- /*
- * Keep the same TMICT and change timer mode to one-shot
- * TMCCT should be > 0 and count-down to 0
- */
- apic_change_mode(APIC_LVT_TIMER_ONESHOT);
- report(apic_read(APIC_TMCCT) < (tmict / 2),
- "TMCCT should not be reset to init");
- wait_until_tmcct_is_zero(tmict, false);
- report(!apic_read(APIC_TMCCT), "TMCCT should have reach zero");
-
- /* now tmcct == 0 and tmict != 0 */
- apic_change_mode(APIC_LVT_TIMER_PERIODIC);
- report(!apic_read(APIC_TMCCT), "TMCCT should stay at zero");
+ /* now tmcct == 0 and tmict != 0 */
+ apic_change_mode(APIC_LVT_TIMER_PERIODIC);
+ p2o_stay_zero = !apic_read(APIC_TMCCT);
+ }
+
+ report(tmict_reset, "TMICT value reset");
+ report(o_nonzero, "one-shot: TMCCT should have a non-zero value");
+ report(o_reached_zero, "one-shot: TMCCT should have reached 0");
+ report(p_nonzero, "periodic: TMCCT should have a non-zero value");
+ report(p_not_reset, "TMCCT should not be reset to TMICT value");
+ report(p_after_wrap, "TMCCT should be reset to the initial-count");
+ report(p2o_not_reset, "TMCCT should not be reset to init");
+ report(p2o_reached_zero, "TMCCT should have reach zero");
+ report(p2o_stay_zero, "TMCCT should stay at zero");
}
#define KVM_HC_SEND_IPI 10
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread* [kvm-unit-tests PATCH 2/2] x86/apic: add retry logic to test_apic_change_mode
2026-04-28 13:35 [kvm-unit-tests PATCH 0/2] x86/apic: fix false test_apic_change_mode failures on stalled vCPUs Igor Mammedov
2026-04-28 13:35 ` [kvm-unit-tests PATCH 1/2] x86/apic: separate reporting from actual measurements Igor Mammedov
@ 2026-04-28 13:35 ` Igor Mammedov
1 sibling, 0 replies; 3+ messages in thread
From: Igor Mammedov @ 2026-04-28 13:35 UTC (permalink / raw)
To: kvm; +Cc: pbonzini
The APIC timer runs at wall clock time under KVM. If vCPU is stalled for
long enough, timer can expire before the guest reads TMCCT or tick well
past expected values, causing various false test failures [1].
Add retry attempts with increasing timer period (10ms, 60ms, 700ms)
if any test fails, to handle spurious failures due to vCPU stalls.
1) Failures we sometimes observe in CI are:
"FAIL: TMCCT should have a non-zero value"
"FAIL: TMCCT should be reset to the initial-count"
"FAIL: TMCCT should not be reset to TMICT value"
Seen on both Intel and AMD hosts.
PS:
on most test runs, test completes fine on the 1st iteration.
The patch would affect only failure path which will be slowed down
due to retries but still fail if there is a bug with benefit of
getting rid of false positives caused by vCPU stalls.
PS2:
Number of tries and tmict delta comes from analyzing vcpu stalls
on heavily overcommited Haswell host with test being bounced between
2 sockets.
Typically 2nd iteration (60ms) is enough to get rid of
false positives.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
x86/apic.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/x86/apic.c b/x86/apic.c
index d4eb8e11..27597323 100644
--- a/x86/apic.c
+++ b/x86/apic.c
@@ -575,9 +575,12 @@ static void test_apic_change_mode(void)
{
const uint32_t tmict_values[] = {
0x999999, /* ~10ms */
+ 0x3938700, /* ~60ms */
+ 0x29b92700, /* ~700ms */
};
int retry, max_retries = ARRAY_SIZE(tmict_values);
uint32_t tmict;
+ bool fail;
bool tmict_reset = false, o_nonzero = false, o_reached_zero = false;
bool p_nonzero = false, p_not_reset = false, p_after_wrap = false;
bool p2o_not_reset = false, p2o_reached_zero = false, p2o_stay_zero = false;
@@ -635,6 +638,11 @@ static void test_apic_change_mode(void)
/* now tmcct == 0 and tmict != 0 */
apic_change_mode(APIC_LVT_TIMER_PERIODIC);
p2o_stay_zero = !apic_read(APIC_TMCCT);
+ fail = !(tmict_reset && o_nonzero && o_reached_zero &&
+ p_nonzero && p_not_reset && p_after_wrap &&
+ p2o_not_reset && p2o_reached_zero && p2o_stay_zero);
+ if (!fail)
+ break;
}
report(tmict_reset, "TMICT value reset");
--
2.47.3
^ permalink raw reply related [flat|nested] 3+ messages in thread