public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: kvm@vger.kernel.org
Cc: pbonzini@redhat.com
Subject: [kvm-unit-tests PATCH 0/2] x86/apic: fix false test_apic_change_mode failures on stalled vCPUs
Date: Tue, 28 Apr 2026 15:35:22 +0200	[thread overview]
Message-ID: <20260428133524.3628482-1-imammedo@redhat.com> (raw)

test_apic_change_mode sporadically fails in CI on both Intel and AMD
hosts with errors like:
  "FAIL: TMCCT should have a non-zero value"
  "FAIL: TMCCT should be reset to the initial-count"
  "FAIL: TMCCT should not be reset to TMICT value"

The root cause is that the APIC timer runs at wall clock time under KVM.
With the default tmict=0x999999 (~10ms period at 1ns bus cycle).

A vCPU stall for sufficiently large portion of TMICT leads to false positives
(reasons could be: host preemption, cross-socket migration, heavy CPU
contention). It's basically not possible to reliably sample timer values
while it's running.

This series adds retry logic with increasing timer periods (10ms, 60ms,
700ms) so that transient vCPU stalls don't cause false failures, while
real bugs still get caught. (most of false failures are handled by 60ms
timer, and 700ms is one pathological case observed in a week of testing)

Reproducer (requires 2+ NUMA nodes):

  stress-ng --cpu 128 --timer 32 --hrtimers 32 --quiet &
  sleep 2
  while true; do
      /usr/libexec/qemu-kvm --no-reboot -nodefaults \
          -global kvm-pit.lost_tick_policy=discard \
          -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 \
          -display none -serial stdio -device pci-testdev \
          -machine q35 -kernel x86/apic.flat \
          -smp 1 -cpu qemu64,+x2apic,+tsc-deadline \
          >> apic_race.log 2>&1 &
      QEMU_PID=$!
      while kill -0 $QEMU_PID 2>/dev/null; do
          taskset -p -c 0 $QEMU_PID 2>/dev/null
          sleep 0.001
          taskset -p -c 1 $QEMU_PID 2>/dev/null
          sleep 0.001
      done
      wait $QEMU_PID 2>/dev/null
  done

patches reduce ~4% failure rate (8 FAILs / 216 PASSes in 2 minutes).
to 0 FAILs over thousands of runs.

Igor Mammedov (2):
  x86/apic: separate reporting from actual measurements
  x86/apic: add retry logic to test_apic_change_mode

 x86/apic.c | 119 ++++++++++++++++++++++++++++++++---------------------
 1 file changed, 72 insertions(+), 47 deletions(-)

-- 
2.47.3


             reply	other threads:[~2026-04-28 13:35 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 13:35 Igor Mammedov [this message]
2026-04-28 13:35 ` [kvm-unit-tests PATCH 1/2] x86/apic: separate reporting from actual measurements Igor Mammedov
2026-04-28 13:35 ` [kvm-unit-tests PATCH 2/2] x86/apic: add retry logic to test_apic_change_mode Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428133524.3628482-1-imammedo@redhat.com \
    --to=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox