[PATCH 0/2] Fix the duplicate PMI injections in vPMU

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Fix the duplicate PMI injections in vPMU
@ 2023-09-25 17:34 Mingwei Zhang
  2023-09-25 17:34 ` [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Mingwei Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Mingwei Zhang @ 2023-09-25 17:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: H. Peter Anvin, kvm, linux-kernel, Jim Mattson, Mingwei Zhang,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

When we do stress test on KVM vPMU using Intel vtune, we find the following
warning kernel message in the guest VM:

[ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3.
[ 1437.487330] Dazed and confused, but trying to continue

The Problem
===========

The above issue indicates that there are more NMIs injected than guest
could recognize. After a month of investigation, we discovered that the
bug happened due to minor glitches in two separate parts of the KVM: 1)
KVM vPMU mistakenly fires a PMI due to emulated counter overflow even
though the overflow has already been fired by the PMI handler on the
host [1]. 2) KVM APIC allows multiple injections of PMI at one VM entry
which violates Intel SDM. Both glitches contributes to extra injection
of PMIs and thus confuses PMI handler in guest VM and causes the above
warning messages.

The Fixes
=========

The patches disallow the multi-PMI injection fundamentally at APIC
level. In addition, they also simplify the PMI injection process by
removing irq_work and only use KVM_REQ_PMI.

The Testing
===========

With the series applied, we do not see the above warning messages when
stress testing VM with Intel vtune. In addition, we add some kernel
printing, all emulated counter overflow happens when hardware counter
value is 0 and emulated counter value is 1 (prev_counter is -1). We
never observed unexpected prev_counter values we saw in [2].

Note that this series does break the upstream kvm-unit-tests/pmu with the
following error:

FAIL: Intel: emulated instruction: instruction counter overflow
FAIL: Intel: full-width writes: emulated instruction: instruction counter overflow

This is a test bug and apply the following diff should fix the issue:

diff --git a/x86/pmu.c b/x86/pmu.c
index 0def2869..667e6233 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -68,6 +68,7 @@ volatile uint64_t irq_received;
 static void cnt_overflow(isr_regs_t *regs)
 {
 »......irq_received++;
+»......apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
 »......apic_write(APIC_EOI, 0);
 }

We will post the above change soon.

[1] commit 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
[2] https://lore.kernel.org/all/CAL715WL9T8Ucnj_1AygwMgDjOJrttNZHRP9o-KUNfpx1aYZnog@mail.gmail.com/

Versioning
==========

The series is in v1. We made some changes:
 - drop Dapeng's reviewed-by, since code changes.
 - applies fix up in kvm_apic_local_deliver(). [seanjc]
 - remove pmc->prev_counter. [seanjc]

Previous version (v0) shown as follows:
 - [APIC patches v0]: https://lore.kernel.org/all/20230901185646.2823254-1-jmattson@google.com/
 - [vPMU patch v0]: https://lore.kernel.org/all/ZQ4A4KaSyygKHDUI@google.com/

Jim Mattson (2):
  KVM: x86: Synthesize at most one PMI per VM-exit
  KVM: x86: Mask LVTPC when handling a PMI

 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/lapic.c            |  8 ++++++--
 arch/x86/kvm/pmu.c              | 27 +--------------------------
 arch/x86/kvm/x86.c              |  3 +++
 4 files changed, 10 insertions(+), 29 deletions(-)

base-commit: 6de2ccc169683bf81feba163834dae7cdebdd826
-- 
2.42.0.515.g380fc7ccd1-goog

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit
  2023-09-25 17:34 [PATCH 0/2] Fix the duplicate PMI injections in vPMU Mingwei Zhang
@ 2023-09-25 17:34 ` Mingwei Zhang
  2023-09-25 17:59   ` Sean Christopherson
  2023-09-25 17:34 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Mingwei Zhang
  2023-09-28 16:41 ` [PATCH 0/2] Fix the duplicate PMI injections in vPMU Sean Christopherson
  2 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2023-09-25 17:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: H. Peter Anvin, kvm, linux-kernel, Jim Mattson, Mingwei Zhang,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

From: Jim Mattson <jmattson@google.com>

When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
VM-exit that also invokes __kvm_perf_overflow() as a result of
instruction emulation, kvm_pmu_deliver_pmi() will be called twice
before the next VM-entry.

That shouldn't be a problem. The local APIC is supposed to
automatically set the mask flag in LVTPC when it handles a PMI, so the
second PMI should be inhibited. However, KVM's local APIC emulation
fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs
are delivered via the local APIC. In the common case, where LVTPC is
configured to deliver an NMI, the first NMI is vectored through the
guest IDT, and the second one is held pending. When the NMI handler
returns, the second NMI is vectored through the IDT. For Linux guests,
this results in the "dazed and confused" spurious NMI message.

Though the obvious fix is to set the mask flag in LVTPC when handling
a PMI, KVM's logic around synthesizing a PMI is unnecessarily
convoluted.

Remove the irq_work callback for synthesizing a PMI, and all of the
logic for invoking it. Instead, to prevent a vcpu from leaving C0 with
a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events().

Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Jim Mattson <jmattson@google.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/pmu.c              | 27 +--------------------------
 arch/x86/kvm/x86.c              |  3 +++
 3 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28bd38303d70..de951d6aa9a8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -528,7 +528,6 @@ struct kvm_pmu {
 	u64 raw_event_mask;
 	struct kvm_pmc gp_counters[KVM_INTEL_PMC_MAX_GENERIC];
 	struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
-	struct irq_work irq_work;
 
 	/*
 	 * Overlay the bitmap with a 64-bit atomic so that all bits can be
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index edb89b51b383..9ae07db6f0f6 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -93,14 +93,6 @@ void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
 #undef __KVM_X86_PMU_OP
 }
 
-static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
-{
-	struct kvm_pmu *pmu = container_of(irq_work, struct kvm_pmu, irq_work);
-	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
-
-	kvm_pmu_deliver_pmi(vcpu);
-}
-
 static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
@@ -124,20 +116,7 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
 	}
 
-	if (!pmc->intr || skip_pmi)
-		return;
-
-	/*
-	 * Inject PMI. If vcpu was in a guest mode during NMI PMI
-	 * can be ejected on a guest mode re-entry. Otherwise we can't
-	 * be sure that vcpu wasn't executing hlt instruction at the
-	 * time of vmexit and is not going to re-enter guest mode until
-	 * woken up. So we should wake it, but this is impossible from
-	 * NMI context. Do it from irq work instead.
-	 */
-	if (in_pmi && !kvm_handling_nmi_from_guest(pmc->vcpu))
-		irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
-	else
+	if (pmc->intr && !skip_pmi)
 		kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
 }
 
@@ -675,9 +654,6 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 
 void kvm_pmu_reset(struct kvm_vcpu *vcpu)
 {
-	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-
-	irq_work_sync(&pmu->irq_work);
 	static_call(kvm_x86_pmu_reset)(vcpu);
 }
 
@@ -687,7 +663,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 
 	memset(pmu, 0, sizeof(*pmu));
 	static_call(kvm_x86_pmu_init)(vcpu);
-	init_irq_work(&pmu->irq_work, kvm_pmi_trigger_fn);
 	pmu->event_count = 0;
 	pmu->need_cleanup = false;
 	kvm_pmu_refresh(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6b9bea62fb8..6f24a8c1e136 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12820,6 +12820,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 		return true;
 #endif
 
+	if (kvm_test_request(KVM_REQ_PMI, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit
  2023-09-25 17:34 ` [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Mingwei Zhang
@ 2023-09-25 17:59   ` Sean Christopherson
  2023-09-25 19:33     ` Mingwei Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2023-09-25 17:59 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel, Jim Mattson,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

On Mon, Sep 25, 2023, Mingwei Zhang wrote:
> From: Jim Mattson <jmattson@google.com>
> 
> When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
> VM-exit that also invokes __kvm_perf_overflow() as a result of
> instruction emulation, kvm_pmu_deliver_pmi() will be called twice
> before the next VM-entry.
> 
> That shouldn't be a problem. The local APIC is supposed to
> automatically set the mask flag in LVTPC when it handles a PMI, so the
> second PMI should be inhibited. However, KVM's local APIC emulation
> fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs
> are delivered via the local APIC. In the common case, where LVTPC is
> configured to deliver an NMI, the first NMI is vectored through the
> guest IDT, and the second one is held pending. When the NMI handler
> returns, the second NMI is vectored through the IDT. For Linux guests,
> this results in the "dazed and confused" spurious NMI message.
> 
> Though the obvious fix is to set the mask flag in LVTPC when handling
> a PMI, KVM's logic around synthesizing a PMI is unnecessarily
> convoluted.

Unless Jim outright objects, I strongly prefer placing this patch second, with
the above two paragraphs replaced with my suggestion (or something similar):

  Calling kvm_pmu_deliver_pmi() twice is unlikely to be problematic now that
  KVM sets the LVTPC mask bit when delivering a PMI.  But using IRQ work to
  trigger the PMI is still broken, albeit very theoretically.

  E.g. if the self-IPI to trigger IRQ work is be delayed long enough for the
  vCPU to be migrated to a different pCPU, then it's possible for
  kvm_pmi_trigger_fn() to race with the kvm_pmu_deliver_pmi() from
  KVM_REQ_PMI and still generate two PMIs.

  KVM could set the mask bit using an atomic operation, but that'd just be
  piling on unnecessary code to workaround what is effectively a hack.  The
  *only* reason KVM uses IRQ work is to ensure the PMI is treated as a wake
  event, e.g. if the vCPU just executed HLT.

I understand Jim's desire for the patch to be more obviously valuable, but the
people that need convincing are already convinced that the patch is worth taking.

> Remove the irq_work callback for synthesizing a PMI, and all of the
> logic for invoking it. Instead, to prevent a vcpu from leaving C0 with
> a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events().
> 
> Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Tested-by: Mingwei Zhang <mizhang@google.com>
> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

Needs your SoB

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit
  2023-09-25 17:59   ` Sean Christopherson
@ 2023-09-25 19:33     ` Mingwei Zhang
  2023-09-25 21:28       ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2023-09-25 19:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel, Jim Mattson,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

On Mon, Sep 25, 2023 at 10:59 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Sep 25, 2023, Mingwei Zhang wrote:
> > From: Jim Mattson <jmattson@google.com>
> >
> > When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
> > VM-exit that also invokes __kvm_perf_overflow() as a result of
> > instruction emulation, kvm_pmu_deliver_pmi() will be called twice
> > before the next VM-entry.
> >
> > That shouldn't be a problem. The local APIC is supposed to
> > automatically set the mask flag in LVTPC when it handles a PMI, so the
> > second PMI should be inhibited. However, KVM's local APIC emulation
> > fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs
> > are delivered via the local APIC. In the common case, where LVTPC is
> > configured to deliver an NMI, the first NMI is vectored through the
> > guest IDT, and the second one is held pending. When the NMI handler
> > returns, the second NMI is vectored through the IDT. For Linux guests,
> > this results in the "dazed and confused" spurious NMI message.
> >
> > Though the obvious fix is to set the mask flag in LVTPC when handling
> > a PMI, KVM's logic around synthesizing a PMI is unnecessarily
> > convoluted.
>
> Unless Jim outright objects, I strongly prefer placing this patch second, with
> the above two paragraphs replaced with my suggestion (or something similar):
>
>   Calling kvm_pmu_deliver_pmi() twice is unlikely to be problematic now that
>   KVM sets the LVTPC mask bit when delivering a PMI.  But using IRQ work to
>   trigger the PMI is still broken, albeit very theoretically.
>
>   E.g. if the self-IPI to trigger IRQ work is be delayed long enough for the
>   vCPU to be migrated to a different pCPU, then it's possible for
>   kvm_pmi_trigger_fn() to race with the kvm_pmu_deliver_pmi() from
>   KVM_REQ_PMI and still generate two PMIs.
>
>   KVM could set the mask bit using an atomic operation, but that'd just be
>   piling on unnecessary code to workaround what is effectively a hack.  The
>   *only* reason KVM uses IRQ work is to ensure the PMI is treated as a wake
>   event, e.g. if the vCPU just executed HLT.
>
> I understand Jim's desire for the patch to be more obviously valuable, but the
> people that need convincing are already convinced that the patch is worth taking.
>
> > Remove the irq_work callback for synthesizing a PMI, and all of the
> > logic for invoking it. Instead, to prevent a vcpu from leaving C0 with
> > a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events().
> >
> > Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Tested-by: Mingwei Zhang <mizhang@google.com>
> > Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>
> Needs your SoB

Signed-off-by: Mingwei Zhang <mizhang@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit
  2023-09-25 19:33     ` Mingwei Zhang
@ 2023-09-25 21:28       ` Sean Christopherson
  0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2023-09-25 21:28 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel, Jim Mattson,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

On Mon, Sep 25, 2023, Mingwei Zhang wrote:
> On Mon, Sep 25, 2023 at 10:59 AM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Sep 25, 2023, Mingwei Zhang wrote:
> > > From: Jim Mattson <jmattson@google.com>
> > >
> > > When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
> > > VM-exit that also invokes __kvm_perf_overflow() as a result of
> > > instruction emulation, kvm_pmu_deliver_pmi() will be called twice
> > > before the next VM-entry.
> > >
> > > That shouldn't be a problem. The local APIC is supposed to
> > > automatically set the mask flag in LVTPC when it handles a PMI, so the
> > > second PMI should be inhibited. However, KVM's local APIC emulation
> > > fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs
> > > are delivered via the local APIC. In the common case, where LVTPC is
> > > configured to deliver an NMI, the first NMI is vectored through the
> > > guest IDT, and the second one is held pending. When the NMI handler
> > > returns, the second NMI is vectored through the IDT. For Linux guests,
> > > this results in the "dazed and confused" spurious NMI message.
> > >
> > > Though the obvious fix is to set the mask flag in LVTPC when handling
> > > a PMI, KVM's logic around synthesizing a PMI is unnecessarily
> > > convoluted.
> >
> > Unless Jim outright objects, I strongly prefer placing this patch second, with
> > the above two paragraphs replaced with my suggestion (or something similar):
> >
> >   Calling kvm_pmu_deliver_pmi() twice is unlikely to be problematic now that
> >   KVM sets the LVTPC mask bit when delivering a PMI.  But using IRQ work to
> >   trigger the PMI is still broken, albeit very theoretically.
> >
> >   E.g. if the self-IPI to trigger IRQ work is be delayed long enough for the
> >   vCPU to be migrated to a different pCPU, then it's possible for
> >   kvm_pmi_trigger_fn() to race with the kvm_pmu_deliver_pmi() from
> >   KVM_REQ_PMI and still generate two PMIs.
> >
> >   KVM could set the mask bit using an atomic operation, but that'd just be
> >   piling on unnecessary code to workaround what is effectively a hack.  The
> >   *only* reason KVM uses IRQ work is to ensure the PMI is treated as a wake
> >   event, e.g. if the vCPU just executed HLT.
> >
> > I understand Jim's desire for the patch to be more obviously valuable, but the
> > people that need convincing are already convinced that the patch is worth taking.
> >
> > > Remove the irq_work callback for synthesizing a PMI, and all of the
> > > logic for invoking it. Instead, to prevent a vcpu from leaving C0 with
> > > a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events().
> > >
> > > Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
> > > Signed-off-by: Jim Mattson <jmattson@google.com>
> > > Tested-by: Mingwei Zhang <mizhang@google.com>
> > > Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> >
> > Needs your SoB
> 
> Signed-off-by: Mingwei Zhang <mizhang@google.com>

Thanks!

Jim gave his blessing off-list for swapping the order, I'll do that and massage
the changelogs when applying, i.e. no need for a v3.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-25 17:34 [PATCH 0/2] Fix the duplicate PMI injections in vPMU Mingwei Zhang
  2023-09-25 17:34 ` [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Mingwei Zhang
@ 2023-09-25 17:34 ` Mingwei Zhang
  2023-09-25 17:52   ` Sean Christopherson
  2023-09-28 16:41 ` [PATCH 0/2] Fix the duplicate PMI injections in vPMU Sean Christopherson
  2 siblings, 1 reply; 15+ messages in thread
From: Mingwei Zhang @ 2023-09-25 17:34 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: H. Peter Anvin, kvm, linux-kernel, Jim Mattson, Mingwei Zhang,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

From: Jim Mattson <jmattson@google.com>

Per the SDM, "When the local APIC handles a performance-monitoring
counters interrupt, it automatically sets the mask flag in the LVT
performance counter register."

Add this behavior to KVM's local APIC emulation, to reduce the
incidence of "dazed and confused" spurious NMI warnings in Linux
guests (at least, those that use a PMI handler with "late_ack").

Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
Signed-off-by: Jim Mattson <jmattson@google.com>
Tested-by: Mingwei Zhang <mizhang@google.com>
---
 arch/x86/kvm/lapic.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 113ca9661ab2..1f3d56a1f45f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2729,13 +2729,17 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 {
 	u32 reg = kvm_lapic_get_reg(apic, lvt_type);
 	int vector, mode, trig_mode;
+	int r;
 
 	if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
 		vector = reg & APIC_VECTOR_MASK;
 		mode = reg & APIC_MODE_MASK;
 		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
-		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
-					NULL);
+
+		r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
+		if (r && lvt_type == APIC_LVTPC)
+			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
+		return r;
 	}
 	return 0;
 }
-- 
2.42.0.515.g380fc7ccd1-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-25 17:34 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Mingwei Zhang
@ 2023-09-25 17:52   ` Sean Christopherson
  2023-09-25 19:34     ` Mingwei Zhang
  0 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2023-09-25 17:52 UTC (permalink / raw)
  To: Mingwei Zhang
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel, Jim Mattson,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

On Mon, Sep 25, 2023, Mingwei Zhang wrote:
> From: Jim Mattson <jmattson@google.com>
> 
> Per the SDM, "When the local APIC handles a performance-monitoring
> counters interrupt, it automatically sets the mask flag in the LVT
> performance counter register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the
> incidence of "dazed and confused" spurious NMI warnings in Linux
> guests (at least, those that use a PMI handler with "late_ack").
> 
> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")

This Fixes is wrong.  Prior to commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests"), KVM didn't ever deliver interrupts via the LVTPC
entry.  E.g. prior to that commit, the only reference to APIC_LVTPC is in
kvm_lapic_reg_write:

  arch/x86/kvm $ git grep APIC_LVTPC f5132b01386b^
  f5132b01386b^:lapic.c:  case APIC_LVTPC:

Commit 23930f9521c9 definitely set the PMU support up to fail, but the bug would
never have existed if kvm_deliver_pmi() had been written as:

void kvm_deliver_pmi(struct kvm_vcpu *vcpu)
{
	struct kvm_lapic *apic = vcpu->arch.apic;

	if (apic && kvm_apic_local_deliver(apic, APIC_LVTPC))
		kvm_lapic_set_reg(apic, APIC_LVTPC,
				  kvm_lapic_get_reg(apic, LVTPC) | APIC_LVT_MASKED);
}

And this needs an explicit Cc: to stable because KVM opts out of AUTOSEL.

So

  Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
  Cc: stable@vger.kernel.org

> Signed-off-by: Jim Mattson <jmattson@google.com>
> Tested-by: Mingwei Zhang <mizhang@google.com>

When posting patches on behalf of others, you need to provide your SoB.

> ---
>  arch/x86/kvm/lapic.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 113ca9661ab2..1f3d56a1f45f 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2729,13 +2729,17 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
>  {
>  	u32 reg = kvm_lapic_get_reg(apic, lvt_type);
>  	int vector, mode, trig_mode;
> +	int r;
>  
>  	if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> -		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
> -					NULL);
> +
> +		r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
> +		if (r && lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);

Belated feedback, I think I'd prefer to write this as

			kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);

so that this code will show up when searching for APIC_LVTPC.

> +		return r;
>  	}
>  	return 0;
>  }
> -- 
> 2.42.0.515.g380fc7ccd1-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-25 17:52   ` Sean Christopherson
@ 2023-09-25 19:34     ` Mingwei Zhang
  0 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2023-09-25 19:34 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, H. Peter Anvin, kvm, linux-kernel, Jim Mattson,
	Dapeng Mi, Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

On Mon, Sep 25, 2023 at 10:52 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Mon, Sep 25, 2023, Mingwei Zhang wrote:
> > From: Jim Mattson <jmattson@google.com>
> >
> > Per the SDM, "When the local APIC handles a performance-monitoring
> > counters interrupt, it automatically sets the mask flag in the LVT
> > performance counter register."
> >
> > Add this behavior to KVM's local APIC emulation, to reduce the
> > incidence of "dazed and confused" spurious NMI warnings in Linux
> > guests (at least, those that use a PMI handler with "late_ack").
> >
> > Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
>
> This Fixes is wrong.  Prior to commit f5132b01386b ("KVM: Expose a version 2
> architectural PMU to a guests"), KVM didn't ever deliver interrupts via the LVTPC
> entry.  E.g. prior to that commit, the only reference to APIC_LVTPC is in
> kvm_lapic_reg_write:
>
>   arch/x86/kvm $ git grep APIC_LVTPC f5132b01386b^
>   f5132b01386b^:lapic.c:  case APIC_LVTPC:
>
> Commit 23930f9521c9 definitely set the PMU support up to fail, but the bug would
> never have existed if kvm_deliver_pmi() had been written as:
>
> void kvm_deliver_pmi(struct kvm_vcpu *vcpu)
> {
>         struct kvm_lapic *apic = vcpu->arch.apic;
>
>         if (apic && kvm_apic_local_deliver(apic, APIC_LVTPC))
>                 kvm_lapic_set_reg(apic, APIC_LVTPC,
>                                   kvm_lapic_get_reg(apic, LVTPC) | APIC_LVT_MASKED);
> }
>
> And this needs an explicit Cc: to stable because KVM opts out of AUTOSEL.
>
> So
>
>   Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")
>   Cc: stable@vger.kernel.org
>
> > Signed-off-by: Jim Mattson <jmattson@google.com>
> > Tested-by: Mingwei Zhang <mizhang@google.com>
>
> When posting patches on behalf of others, you need to provide your SoB.
>
> > ---
> >  arch/x86/kvm/lapic.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 113ca9661ab2..1f3d56a1f45f 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2729,13 +2729,17 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
> >  {
> >       u32 reg = kvm_lapic_get_reg(apic, lvt_type);
> >       int vector, mode, trig_mode;
> > +     int r;
> >
> >       if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
> >               vector = reg & APIC_VECTOR_MASK;
> >               mode = reg & APIC_MODE_MASK;
> >               trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> > -             return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
> > -                                     NULL);
> > +
> > +             r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
> > +             if (r && lvt_type == APIC_LVTPC)
> > +                     kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
>
> Belated feedback, I think I'd prefer to write this as
>
>                         kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
>
> so that this code will show up when searching for APIC_LVTPC.
>
> > +             return r;
> >       }
> >       return 0;
> >  }
> > --
> > 2.42.0.515.g380fc7ccd1-goog
> >
Signed-off-by: Mingwei Zhang <mizhang@google.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] Fix the duplicate PMI injections in vPMU
  2023-09-25 17:34 [PATCH 0/2] Fix the duplicate PMI injections in vPMU Mingwei Zhang
  2023-09-25 17:34 ` [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Mingwei Zhang
  2023-09-25 17:34 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Mingwei Zhang
@ 2023-09-28 16:41 ` Sean Christopherson
  2 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2023-09-28 16:41 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, Mingwei Zhang
  Cc: H. Peter Anvin, kvm, linux-kernel, Jim Mattson, Dapeng Mi,
	Like Xu, Roman Kagan, Kan Liang, Dapeng1 Mi

On Mon, 25 Sep 2023 17:34:45 +0000, Mingwei Zhang wrote:
> When we do stress test on KVM vPMU using Intel vtune, we find the following
> warning kernel message in the guest VM:
> 
> [ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3.
> [ 1437.487330] Dazed and confused, but trying to continue
> 
> The Problem
> ===========
> 
> [...]

Applied to kvm-x86 pmu, with the order swapped and a bit of changelog massaging.
Thanks!

[1/2] KVM: x86: Mask LVTPC when handling a PMI
      https://github.com/kvm-x86/linux/commit/a16eb25b09c0
[2/2] KVM: x86/pmu: Synthesize at most one PMI per VM-exit
      https://github.com/kvm-x86/linux/commit/73554b29bd70

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit
@ 2023-09-01 18:56 Jim Mattson
  2023-09-01 18:56 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Jim Mattson
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Mattson @ 2023-09-01 18:56 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Like Xu, Mingwei Zhang, Roman Kagan, Kan Liang, Dapeng1 Mi,
	Jim Mattson

When the irq_work callback, kvm_pmi_trigger_fn(), is invoked during a
VM-exit that also invokes __kvm_perf_overflow() as a result of
instruction emulation, kvm_pmu_deliver_pmi() will be called twice
before the next VM-entry.

That shouldn't be a problem. The local APIC is supposed to
automatically set the mask flag in LVTPC when it handles a PMI, so the
second PMI should be inhibited. However, KVM's local APIC emulation
fails to set the mask flag in LVTPC when it handles a PMI, so two PMIs
are delivered via the local APIC. In the common case, where LVTPC is
configured to deliver an NMI, the first NMI is vectored through the
guest IDT, and the second one is held pending. When the NMI handler
returns, the second NMI is vectored through the IDT. For Linux guests,
this results in the "dazed and confused" spurious NMI message.

Though the obvious fix is to set the mask flag in LVTPC when handling
a PMI, KVM's logic around synthesizing a PMI is unnecessarily
convoluted.

Remove the irq_work callback for synthesizing a PMI, and all of the
logic for invoking it. Instead, to prevent a vcpu from leaving C0 with
a PMI pending, add a check for KVM_REQ_PMI to kvm_vcpu_has_events().

Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/pmu.c              | 27 +--------------------------
 arch/x86/kvm/x86.c              |  3 +++
 3 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3bc146dfd38d..f6b9e3ae08bf 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -528,7 +528,6 @@ struct kvm_pmu {
 	u64 raw_event_mask;
 	struct kvm_pmc gp_counters[KVM_INTEL_PMC_MAX_GENERIC];
 	struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
-	struct irq_work irq_work;
 
 	/*
 	 * Overlay the bitmap with a 64-bit atomic so that all bits can be
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index bf653df86112..0c117cd24077 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -93,14 +93,6 @@ void kvm_pmu_ops_update(const struct kvm_pmu_ops *pmu_ops)
 #undef __KVM_X86_PMU_OP
 }
 
-static void kvm_pmi_trigger_fn(struct irq_work *irq_work)
-{
-	struct kvm_pmu *pmu = container_of(irq_work, struct kvm_pmu, irq_work);
-	struct kvm_vcpu *vcpu = pmu_to_vcpu(pmu);
-
-	kvm_pmu_deliver_pmi(vcpu);
-}
-
 static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
@@ -124,20 +116,7 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
 	}
 
-	if (!pmc->intr || skip_pmi)
-		return;
-
-	/*
-	 * Inject PMI. If vcpu was in a guest mode during NMI PMI
-	 * can be ejected on a guest mode re-entry. Otherwise we can't
-	 * be sure that vcpu wasn't executing hlt instruction at the
-	 * time of vmexit and is not going to re-enter guest mode until
-	 * woken up. So we should wake it, but this is impossible from
-	 * NMI context. Do it from irq work instead.
-	 */
-	if (in_pmi && !kvm_handling_nmi_from_guest(pmc->vcpu))
-		irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
-	else
+	if (pmc->intr && !skip_pmi)
 		kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
 }
 
@@ -677,9 +656,6 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
 
 void kvm_pmu_reset(struct kvm_vcpu *vcpu)
 {
-	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-
-	irq_work_sync(&pmu->irq_work);
 	static_call(kvm_x86_pmu_reset)(vcpu);
 }
 
@@ -689,7 +665,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 
 	memset(pmu, 0, sizeof(*pmu));
 	static_call(kvm_x86_pmu_init)(vcpu);
-	init_irq_work(&pmu->irq_work, kvm_pmi_trigger_fn);
 	pmu->event_count = 0;
 	pmu->need_cleanup = false;
 	kvm_pmu_refresh(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c381770bcbf1..0732c09fbd2d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12841,6 +12841,9 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 		return true;
 #endif
 
+	if (kvm_test_request(KVM_REQ_PMI, vcpu))
+		return true;
+
 	if (kvm_arch_interrupt_allowed(vcpu) &&
 	    (kvm_cpu_has_interrupt(vcpu) ||
 	    kvm_guest_apic_has_interrupt(vcpu)))
-- 
2.42.0.283.g2d96d420d3-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-01 18:56 [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Jim Mattson
@ 2023-09-01 18:56 ` Jim Mattson
  2023-09-02 19:06   ` Mingwei Zhang
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Jim Mattson @ 2023-09-01 18:56 UTC (permalink / raw)
  To: kvm, Sean Christopherson, Paolo Bonzini
  Cc: Like Xu, Mingwei Zhang, Roman Kagan, Kan Liang, Dapeng1 Mi,
	Jim Mattson

Per the SDM, "When the local APIC handles a performance-monitoring
counters interrupt, it automatically sets the mask flag in the LVT
performance counter register."

Add this behavior to KVM's local APIC emulation, to reduce the
incidence of "dazed and confused" spurious NMI warnings in Linux
guests (at least, those that use a PMI handler with "late_ack").

Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
Signed-off-by: Jim Mattson <jmattson@google.com>
---
 arch/x86/kvm/lapic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a983a16163b1..1a79ec54ae1e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 		vector = reg & APIC_VECTOR_MASK;
 		mode = reg & APIC_MODE_MASK;
 		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+		if (lvt_type == APIC_LVTPC)
+			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
 		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
 					NULL);
 	}
-- 
2.42.0.283.g2d96d420d3-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-01 18:56 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Jim Mattson
@ 2023-09-02 19:06   ` Mingwei Zhang
  2023-09-06  8:59   ` Mi, Dapeng1
  2023-09-22 18:22   ` Sean Christopherson
  2 siblings, 0 replies; 15+ messages in thread
From: Mingwei Zhang @ 2023-09-02 19:06 UTC (permalink / raw)
  To: Jim Mattson
  Cc: kvm, Sean Christopherson, Paolo Bonzini, Like Xu, Roman Kagan,
	Kan Liang, Dapeng1 Mi

On Fri, Sep 01, 2023, Jim Mattson wrote:
> Per the SDM, "When the local APIC handles a performance-monitoring
> counters interrupt, it automatically sets the mask flag in the LVT
> performance counter register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the
> incidence of "dazed and confused" spurious NMI warnings in Linux
> guests (at least, those that use a PMI handler with "late_ack").
> 
> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
> Signed-off-by: Jim Mattson <jmattson@google.com>
Tested-by: Mingwei Zhang <mizhang@google.com>

I see consistent number of PMIs and NMIs when running perf on an idle
VM.
> ---
>  arch/x86/kvm/lapic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a983a16163b1..1a79ec54ae1e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> +		if (lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
>  		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
>  					NULL);
>  	}
> -- 
> 2.42.0.283.g2d96d420d3-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-01 18:56 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Jim Mattson
  2023-09-02 19:06   ` Mingwei Zhang
@ 2023-09-06  8:59   ` Mi, Dapeng1
  2023-09-22 18:22   ` Sean Christopherson
  2 siblings, 0 replies; 15+ messages in thread
From: Mi, Dapeng1 @ 2023-09-06  8:59 UTC (permalink / raw)
  To: Jim Mattson, kvm@vger.kernel.org, Christopherson,, Sean,
	Paolo Bonzini
  Cc: Xu, Like, Mingwei Zhang, Roman Kagan, Liang, Kan

> -----Original Message-----
> From: Jim Mattson <jmattson@google.com>
> Sent: Saturday, September 2, 2023 2:57 AM
> To: kvm@vger.kernel.org; Christopherson,, Sean <seanjc@google.com>; Paolo
> Bonzini <pbonzini@redhat.com>
> Cc: Xu, Like <likexu@tencent.com>; Mingwei Zhang <mizhang@google.com>;
> Roman Kagan <rkagan@amazon.de>; Liang, Kan <kan.liang@intel.com>; Mi,
> Dapeng1 <dapeng1.mi@intel.com>; Jim Mattson <jmattson@google.com>
> Subject: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
> 
> Per the SDM, "When the local APIC handles a performance-monitoring counters
> interrupt, it automatically sets the mask flag in the LVT performance counter
> register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the incidence of
> "dazed and confused" spurious NMI warnings in Linux guests (at least, those that
> use a PMI handler with "late_ack").
> 
> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/lapic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index
> a983a16163b1..1a79ec54ae1e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int
> lvt_type)
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> +		if (lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg |
> APIC_LVT_MASKED);

Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

>  		return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
>  					NULL);
>  	}
> --
> 2.42.0.283.g2d96d420d3-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-01 18:56 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Jim Mattson
  2023-09-02 19:06   ` Mingwei Zhang
  2023-09-06  8:59   ` Mi, Dapeng1
@ 2023-09-22 18:22   ` Sean Christopherson
  2023-09-25 17:52     ` Jim Mattson
  2 siblings, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2023-09-22 18:22 UTC (permalink / raw)
  To: Jim Mattson
  Cc: kvm, Paolo Bonzini, Like Xu, Mingwei Zhang, Roman Kagan,
	Kan Liang, Dapeng1 Mi

On Fri, Sep 01, 2023, Jim Mattson wrote:
> Per the SDM, "When the local APIC handles a performance-monitoring
> counters interrupt, it automatically sets the mask flag in the LVT
> performance counter register."
> 
> Add this behavior to KVM's local APIC emulation, to reduce the
> incidence of "dazed and confused" spurious NMI warnings in Linux
> guests (at least, those that use a PMI handler with "late_ack").

Hmm, I don't like the "to reduce the incidence" language as that suggests that
this isn't a hard requirement.  That makes it sound like KVM is doing the guest
a favor.  This?

    Per the SDM, "When the local APIC handles a performance-monitoring
    counters interrupt, it automatically sets the mask flag in the LVT
    performance counter register."  Add this behavior to KVM's local APIC
    emulation.
    
    Failure to mask the LVTPC entry results in spurious PMIs, e.g. when
    running Linux as a guest, PMI handlers that do a "late_ack" spew a large
    number of "dazed and confused" spurious NMI warnings.

> Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
>  arch/x86/kvm/lapic.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a983a16163b1..1a79ec54ae1e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2743,6 +2743,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
>  		vector = reg & APIC_VECTOR_MASK;
>  		mode = reg & APIC_MODE_MASK;
>  		trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
> +		if (lvt_type == APIC_LVTPC)
> +			kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);

IMO, unconditionally setting the masked bit is wrong.  I'm 99% certain KVM should
only set the mask bit if an interrupt is actually delivered somehwere.

__apic_accept_irq() "fails" as follows:

  APIC_DM_LOWEST and APIC_DM_FIXED
	1. Trigger Mode is "level" triggered and level is deasserted.  This can't
           happen because this code hardcodes the level to '1'.

        2. APIC is disabled (H/W or S/W).  This is a non-issue because the H/W
           disabled case ignores it entirely, and all masks are defined to be set
           if the APIC is S/W disabled (per the SDM):

           The mask bits for all the LVT entries are set. Attempts to reset these
           bits will be ignored.


  APIC_DM_SMI
        1. If SMI injection fails because CONFIG_KVM_SMM=n.

  APIC_DM_INIT
        1. Trigger Mode is "level" triggered and level is deasserted.  As above,
           this can't happen.

  APIC_DM_EXTINT
        1. Unconditionally ignored by KVM.  This is architecturally correct for
           the LVTPC as the SDM says:

           Not supported for the LVT CMCI register, the LVT thermal monitor
           register, or the LVT performance counter register.

So basically, failure happens if and only if the guest attempts to send an SMI or
ExtINT that KVM ignores.  The SDM doesn't explicitly state that mask bit is left
unset in these cases, but my reading of

  When the local APIC handles a performance-monitoring counters interrupt, it
  automatically sets the mask flag in the LVT performance counter register.

is that there has to be an actual interrupt.

I highly doubt any software will ever care, e.g. the guest is going to be unhappy
in CONFIG_KVM_SMM=n case no matter what, but setting the mask bit without actually
triggering an interrupt seems odd.

This as fixup?

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 71d87b0db0d9..ebfc3d92a266 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2759,15 +2759,17 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
 {
        u32 reg = kvm_lapic_get_reg(apic, lvt_type);
        int vector, mode, trig_mode;
+       int r;
 
        if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
                vector = reg & APIC_VECTOR_MASK;
                mode = reg & APIC_MODE_MASK;
                trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
-               if (lvt_type == APIC_LVTPC)
+
+               r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
+               if (r && lvt_type == APIC_LVTPC)
                        kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
-               return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
-                                       NULL);
+               return r;
        }
        return 0;
 }


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-22 18:22   ` Sean Christopherson
@ 2023-09-25 17:52     ` Jim Mattson
  2023-09-25 18:00       ` Sean Christopherson
  0 siblings, 1 reply; 15+ messages in thread
From: Jim Mattson @ 2023-09-25 17:52 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Paolo Bonzini, Like Xu, Mingwei Zhang, Roman Kagan,
	Kan Liang, Dapeng1 Mi

On Fri, Sep 22, 2023 at 11:22 AM Sean Christopherson <seanjc@google.com> wrote:
>The SDM doesn't explicitly state that mask bit is left
> unset in these cases, but my reading of
>
>   When the local APIC handles a performance-monitoring counters interrupt, it
>   automatically sets the mask flag in the LVT performance counter register.
>
> is that there has to be an actual interrupt.

I assume you mean an actual interrupt from the APIC, not an actual PMI
to the APIC.

I would argue that one way of "handling" a performance-monitoring
counters interrupt is to do nothing with it, but apparently I'm wrong.
At least, if I set the delivery mode in LVTPC to the illegal value of
6, I never see the LVTPC.MASK bit get set.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI
  2023-09-25 17:52     ` Jim Mattson
@ 2023-09-25 18:00       ` Sean Christopherson
  0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2023-09-25 18:00 UTC (permalink / raw)
  To: Jim Mattson
  Cc: kvm, Paolo Bonzini, Like Xu, Mingwei Zhang, Roman Kagan,
	Kan Liang, Dapeng1 Mi

On Mon, Sep 25, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 11:22 AM Sean Christopherson <seanjc@google.com> wrote:
> >The SDM doesn't explicitly state that mask bit is left
> > unset in these cases, but my reading of
> >
> >   When the local APIC handles a performance-monitoring counters interrupt, it
> >   automatically sets the mask flag in the LVT performance counter register.
> >
> > is that there has to be an actual interrupt.
> 
> I assume you mean an actual interrupt from the APIC, not an actual PMI
> to the APIC.

Yeah.

> I would argue that one way of "handling" a performance-monitoring
> counters interrupt is to do nothing with it, but apparently I'm wrong.
> At least, if I set the delivery mode in LVTPC to the illegal value of
> 6, I never see the LVTPC.MASK bit get set.

Heh, you could complain to Intel and see if they'll change "handles" to "delivers" :-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-09-28 16:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-25 17:34 [PATCH 0/2] Fix the duplicate PMI injections in vPMU Mingwei Zhang
2023-09-25 17:34 ` [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Mingwei Zhang
2023-09-25 17:59   ` Sean Christopherson
2023-09-25 19:33     ` Mingwei Zhang
2023-09-25 21:28       ` Sean Christopherson
2023-09-25 17:34 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Mingwei Zhang
2023-09-25 17:52   ` Sean Christopherson
2023-09-25 19:34     ` Mingwei Zhang
2023-09-28 16:41 ` [PATCH 0/2] Fix the duplicate PMI injections in vPMU Sean Christopherson
  -- strict thread matches above, loose matches on Subject: below --
2023-09-01 18:56 [PATCH 1/2] KVM: x86: Synthesize at most one PMI per VM-exit Jim Mattson
2023-09-01 18:56 ` [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI Jim Mattson
2023-09-02 19:06   ` Mingwei Zhang
2023-09-06  8:59   ` Mi, Dapeng1
2023-09-22 18:22   ` Sean Christopherson
2023-09-25 17:52     ` Jim Mattson
2023-09-25 18:00       ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox