kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active
@ 2024-07-19 23:50 Sean Christopherson
  2024-07-19 23:50 ` [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits Sean Christopherson
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

I made the mistake of expanding my testing to run with and without AVIC
enabled, and to my surprise (wow, sarcasm), x2AVIC failed hard on the
xapic_state_test due to ICR issues.

AFAICT, the issue is that AMD splits the 64-bit ICR into the legacy ICR
and ICR2 fields when storing the ICR in the vAPIC (apparently "it's a
single 64-bit register" is open to intepretation).  Aside from causing
the selftest failure and potential live migration issues, botching the
format is quite bad, as KVM will mishandle incomplete virtualized IPIs,
e.g. generate IRQs to the wrong vCPU, drop IRQs, etc.

Patch 1 fixes are rather annoying wart where the xapic_state *deliberately*
skips reserved bit tests to work around a KVM bug.  *sigh*

I couldn't find anything definitive in the APM, my findings are based on
testing on Genoa.

v2: Actually send the whole series.

Sean Christopherson (10):
  KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits
  KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()
  KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  KVM: selftests: Open code vcpu_run() equivalent in guest_printf test
  KVM: selftests: Report unhandled exceptions on x86 as regular guest
    asserts
  KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs
  KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is
    enabled
  KVM: selftests: Test x2APIC ICR reserved bits
  KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote
  KVM: selftests: Play nice with AMD's AVIC errata

 arch/x86/include/asm/kvm_host.h               |  2 +
 arch/x86/kvm/lapic.c                          | 73 +++++++++++++------
 arch/x86/kvm/svm/svm.c                        |  2 +
 arch/x86/kvm/vmx/main.c                       |  2 +
 .../testing/selftests/kvm/guest_print_test.c  | 19 ++++-
 .../selftests/kvm/include/x86_64/apic.h       | 21 +++++-
 .../selftests/kvm/lib/x86_64/processor.c      |  8 +-
 .../selftests/kvm/x86_64/xapic_state_test.c   | 54 +++++++++-----
 8 files changed, 135 insertions(+), 46 deletions(-)


base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
@ 2024-07-19 23:50 ` Sean Christopherson
  2024-11-01 18:34   ` Sean Christopherson
  2024-07-19 23:50 ` [PATCH v2 02/10] KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode() Sean Christopherson
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Inject a #GP on a WRMSR(ICR) that attempts to set any reserved bits that
are must-be-zero on both Intel and AMD, i.e. any reserved bits other than
the BUSY bit, which Intel ignores and basically says is undefined.

KVM's xapic_state_test selftest has been fudging the bug since commit
4b88b1a518b3 ("KVM: selftests: Enhance handling WRMSR ICR register in
x2APIC mode"), which essentially removed the testcase instead of fixing
the bug.

WARN if the nodecode path triggers a #GP, as the CPU is supposed to check
reserved bits for ICR when it's partially virtualized.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a7172ba59ad2..35c4567567a2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2472,7 +2472,7 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	 * maybe-unecessary write, and both are in the noise anyways.
 	 */
 	if (apic_x2apic_mode(apic) && offset == APIC_ICR)
-		kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR));
+		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR)));
 	else
 		kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
 }
@@ -3186,8 +3186,21 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 	return 0;
 }
 
+#define X2APIC_ICR_RESERVED_BITS (GENMASK_ULL(31, 20) | GENMASK_ULL(17, 16) | BIT(13))
+
 int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
 {
+	if (data & X2APIC_ICR_RESERVED_BITS)
+		return 1;
+
+	/*
+	 * The BUSY bit is reserved on both Intel and AMD in x2APIC mode, but
+	 * only AMD requires it to be zero, Intel essentially just ignores the
+	 * bit.  And if IPI virtualization (Intel) or x2AVIC (AMD) is enabled,
+	 * the CPU performs the reserved bits checks, i.e. the underlying CPU
+	 * behavior will "win".  Arbitrarily clear the BUSY bit, as there is no
+	 * sane way to provide consistent behavior with respect to hardware.
+	 */
 	data &= ~APIC_ICR_BUSY;
 
 	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 02/10] KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
  2024-07-19 23:50 ` [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits Sean Christopherson
@ 2024-07-19 23:50 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC) Sean Christopherson
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:50 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Hoist kvm_x2apic_icr_write() above kvm_apic_write_nodecode() so that a
local helper to _read_ the x2APIC ICR can be added and used in the
nodecode path without needing a forward declaration.

No functional change intended.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 46 ++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 35c4567567a2..d14ef485b0bd 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2455,6 +2455,29 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
+#define X2APIC_ICR_RESERVED_BITS (GENMASK_ULL(31, 20) | GENMASK_ULL(17, 16) | BIT(13))
+
+int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
+{
+	if (data & X2APIC_ICR_RESERVED_BITS)
+		return 1;
+
+	/*
+	 * The BUSY bit is reserved on both Intel and AMD in x2APIC mode, but
+	 * only AMD requires it to be zero, Intel essentially just ignores the
+	 * bit.  And if IPI virtualization (Intel) or x2AVIC (AMD) is enabled,
+	 * the CPU performs the reserved bits checks, i.e. the underlying CPU
+	 * behavior will "win".  Arbitrarily clear the BUSY bit, as there is no
+	 * sane way to provide consistent behavior with respect to hardware.
+	 */
+	data &= ~APIC_ICR_BUSY;
+
+	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
+	kvm_lapic_set_reg64(apic, APIC_ICR, data);
+	trace_kvm_apic_write(APIC_ICR, data);
+	return 0;
+}
+
 /* emulate APIC access in a trap manner */
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 {
@@ -3186,29 +3209,6 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 	return 0;
 }
 
-#define X2APIC_ICR_RESERVED_BITS (GENMASK_ULL(31, 20) | GENMASK_ULL(17, 16) | BIT(13))
-
-int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
-{
-	if (data & X2APIC_ICR_RESERVED_BITS)
-		return 1;
-
-	/*
-	 * The BUSY bit is reserved on both Intel and AMD in x2APIC mode, but
-	 * only AMD requires it to be zero, Intel essentially just ignores the
-	 * bit.  And if IPI virtualization (Intel) or x2AVIC (AMD) is enabled,
-	 * the CPU performs the reserved bits checks, i.e. the underlying CPU
-	 * behavior will "win".  Arbitrarily clear the BUSY bit, as there is no
-	 * sane way to provide consistent behavior with respect to hardware.
-	 */
-	data &= ~APIC_ICR_BUSY;
-
-	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
-	kvm_lapic_set_reg64(apic, APIC_ICR, data);
-	trace_kvm_apic_write(APIC_ICR, data);
-	return 0;
-}
-
 static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
 {
 	u32 low;
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
  2024-07-19 23:50 ` [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits Sean Christopherson
  2024-07-19 23:50 ` [PATCH v2 02/10] KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode() Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-08-22 18:44   ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 04/10] KVM: selftests: Open code vcpu_run() equivalent in guest_printf test Sean Christopherson
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Re-introduce the "split" x2APIC ICR storage that KVM used prior to Intel's
IPI virtualization support, but only for AMD.  While not stated anywhere
in the APM, despite stating the ICR is a single 64-bit register, AMD CPUs
store the 64-bit ICR as two separate 32-bit values in ICR and ICR2.  When
IPI virtualization (IPIv on Intel, all AVIC flavors on AMD) is enabled,
KVM needs to match CPU behavior as some ICR ICR writes will be handled by
the CPU, not by KVM.

Add a kvm_x86_ops knob to control the underlying format used by the CPU to
store the x2APIC ICR, and tune it to AMD vs. Intel regardless of whether
or not x2AVIC is enabled.  If KVM is handling all ICR writes, the storage
format for x2APIC mode doesn't matter, and having the behavior follow AMD
versus Intel will provide better test coverage and ease debugging.

Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/lapic.c            | 42 +++++++++++++++++++++++----------
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/main.c         |  2 ++
 4 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 950a03e0181e..edc235521434 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1726,6 +1726,8 @@ struct kvm_x86_ops {
 	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
 	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
 	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
+
+	const bool x2apic_icr_is_split;
 	const unsigned long required_apicv_inhibits;
 	bool allow_apicv_in_x2apic_without_x2apic_virtualization;
 	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d14ef485b0bd..cc0a1008fae4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2473,11 +2473,25 @@ int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
 	data &= ~APIC_ICR_BUSY;
 
 	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
-	kvm_lapic_set_reg64(apic, APIC_ICR, data);
+	if (kvm_x86_ops.x2apic_icr_is_split) {
+		kvm_lapic_set_reg(apic, APIC_ICR, data);
+		kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32);
+	} else {
+		kvm_lapic_set_reg64(apic, APIC_ICR, data);
+	}
 	trace_kvm_apic_write(APIC_ICR, data);
 	return 0;
 }
 
+static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic)
+{
+	if (kvm_x86_ops.x2apic_icr_is_split)
+		return (u64)kvm_lapic_get_reg(apic, APIC_ICR) |
+		       (u64)kvm_lapic_get_reg(apic, APIC_ICR2) << 32;
+
+	return kvm_lapic_get_reg64(apic, APIC_ICR);
+}
+
 /* emulate APIC access in a trap manner */
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 {
@@ -2495,7 +2509,7 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	 * maybe-unecessary write, and both are in the noise anyways.
 	 */
 	if (apic_x2apic_mode(apic) && offset == APIC_ICR)
-		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR)));
+		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_x2apic_icr_read(apic)));
 	else
 		kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
 }
@@ -3005,18 +3019,22 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 
 		/*
 		 * In x2APIC mode, the LDR is fixed and based on the id.  And
-		 * ICR is internally a single 64-bit register, but needs to be
-		 * split to ICR+ICR2 in userspace for backwards compatibility.
+		 * if the ICR is _not_ split, ICR is internally a single 64-bit
+		 * register, but needs to be split to ICR+ICR2 in userspace for
+		 * backwards compatibility.
 		 */
-		if (set) {
+		if (set)
 			*ldr = kvm_apic_calc_x2apic_ldr(*id);
 
-			icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
-			      (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
-			__kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
-		} else {
-			icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
-			__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
+		if (!kvm_x86_ops.x2apic_icr_is_split) {
+			if (set) {
+				icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
+				      (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
+				__kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
+			} else {
+				icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
+				__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
+			}
 		}
 	}
 
@@ -3214,7 +3232,7 @@ static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
 	u32 low;
 
 	if (reg == APIC_ICR) {
-		*data = kvm_lapic_get_reg64(apic, APIC_ICR);
+		*data = kvm_x2apic_icr_read(apic);
 		return 0;
 	}
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c115d26844f7..04c113386de6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5049,6 +5049,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.enable_nmi_window = svm_enable_nmi_window,
 	.enable_irq_window = svm_enable_irq_window,
 	.update_cr8_intercept = svm_update_cr8_intercept,
+
+	.x2apic_icr_is_split = true,
 	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
 	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
 	.apicv_post_state_restore = avic_apicv_post_state_restore,
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 0bf35ebe8a1b..a70699665e11 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -89,6 +89,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
 	.enable_nmi_window = vmx_enable_nmi_window,
 	.enable_irq_window = vmx_enable_irq_window,
 	.update_cr8_intercept = vmx_update_cr8_intercept,
+
+	.x2apic_icr_is_split = false,
 	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
 	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
 	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 04/10] KVM: selftests: Open code vcpu_run() equivalent in guest_printf test
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (2 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC) Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 05/10] KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts Sean Christopherson
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Open code a version of vcpu_run() in the guest_printf test in anticipation
of adding UCALL_ABORT handling to _vcpu_run().  The guest_printf test
intentionally generates asserts to verify the output, and thus needs to
bypass common assert handling.

Open code a helper in the guest_printf test, as it's not expected that any
other test would want to skip _only_ the UCALL_ABORT handling.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../testing/selftests/kvm/guest_print_test.c  | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/guest_print_test.c b/tools/testing/selftests/kvm/guest_print_test.c
index 8092c2d0f5d6..bcf582852db9 100644
--- a/tools/testing/selftests/kvm/guest_print_test.c
+++ b/tools/testing/selftests/kvm/guest_print_test.c
@@ -107,6 +107,21 @@ static void ucall_abort(const char *assert_msg, const char *expected_assert_msg)
 		    expected_assert_msg, &assert_msg[offset]);
 }
 
+/*
+ * Open code vcpu_run(), sans the UCALL_ABORT handling, so that intentional
+ * guest asserts guest can be verified instead of being reported as failures.
+ */
+static void do_vcpu_run(struct kvm_vcpu *vcpu)
+{
+	int r;
+
+	do {
+		r = __vcpu_run(vcpu);
+	} while (r == -1 && errno == EINTR);
+
+	TEST_ASSERT(!r, KVM_IOCTL_ERROR(KVM_RUN, r));
+}
+
 static void run_test(struct kvm_vcpu *vcpu, const char *expected_printf,
 		     const char *expected_assert)
 {
@@ -114,7 +129,7 @@ static void run_test(struct kvm_vcpu *vcpu, const char *expected_printf,
 	struct ucall uc;
 
 	while (1) {
-		vcpu_run(vcpu);
+		do_vcpu_run(vcpu);
 
 		TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
 			    "Unexpected exit reason: %u (%s),",
@@ -159,7 +174,7 @@ static void test_limits(void)
 
 	vm = vm_create_with_one_vcpu(&vcpu, guest_code_limits);
 	run = vcpu->run;
-	vcpu_run(vcpu);
+	do_vcpu_run(vcpu);
 
 	TEST_ASSERT(run->exit_reason == UCALL_EXIT_REASON,
 		    "Unexpected exit reason: %u (%s),",
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 05/10] KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (3 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 04/10] KVM: selftests: Open code vcpu_run() equivalent in guest_printf test Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 06/10] KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs Sean Christopherson
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Now that selftests support printf() in the guest, report unexpected
exceptions via the regular assertion framework.  Exceptions were special
cased purely to provide a better error message.  Convert only x86 for now,
as it's low-hanging fruit (already formats the assertion in the guest),
and converting x86 will allow adding asserts in x86 library code without
needing to update multiple tests.

Once all other architectures are converted, this will allow moving the
reporting to common code, which will in turn allow adding asserts in
common library code, and will also allow removing UCALL_UNHANDLED.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/lib/x86_64/processor.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 153739f2e201..814a604c0891 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -566,10 +566,8 @@ void route_exception(struct ex_regs *regs)
 	if (kvm_fixup_exception(regs))
 		return;
 
-	ucall_assert(UCALL_UNHANDLED,
-		     "Unhandled exception in guest", __FILE__, __LINE__,
-		     "Unhandled exception '0x%lx' at guest RIP '0x%lx'",
-		     regs->vector, regs->rip);
+	GUEST_FAIL("Unhandled exception '0x%lx' at guest RIP '0x%lx'",
+		   regs->vector, regs->rip);
 }
 
 static void vm_init_descriptor_tables(struct kvm_vm *vm)
@@ -611,7 +609,7 @@ void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
 {
 	struct ucall uc;
 
-	if (get_ucall(vcpu, &uc) == UCALL_UNHANDLED)
+	if (get_ucall(vcpu, &uc) == UCALL_ABORT)
 		REPORT_GUEST_ASSERT(uc);
 }
 
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 06/10] KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (4 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 05/10] KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 07/10] KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled Sean Christopherson
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Add helpers to allow and expect #GP on x2APIC MSRs, and opportunistically
have the existing helper spit out a more useful error message if an
unexpected exception occurs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/include/x86_64/apic.h       | 21 ++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/apic.h b/tools/testing/selftests/kvm/include/x86_64/apic.h
index 0f268b55fa06..51990094effd 100644
--- a/tools/testing/selftests/kvm/include/x86_64/apic.h
+++ b/tools/testing/selftests/kvm/include/x86_64/apic.h
@@ -11,6 +11,7 @@
 #include <stdint.h>
 
 #include "processor.h"
+#include "ucall_common.h"
 
 #define APIC_DEFAULT_GPA		0xfee00000ULL
 
@@ -93,9 +94,27 @@ static inline uint64_t x2apic_read_reg(unsigned int reg)
 	return rdmsr(APIC_BASE_MSR + (reg >> 4));
 }
 
+static inline uint8_t x2apic_write_reg_safe(unsigned int reg, uint64_t value)
+{
+	return wrmsr_safe(APIC_BASE_MSR + (reg >> 4), value);
+}
+
 static inline void x2apic_write_reg(unsigned int reg, uint64_t value)
 {
-	wrmsr(APIC_BASE_MSR + (reg >> 4), value);
+	uint8_t fault = x2apic_write_reg_safe(reg, value);
+
+	__GUEST_ASSERT(!fault, "Unexpected fault 0x%x on WRMSR(%x) = %lx\n",
+		       fault, APIC_BASE_MSR + (reg >> 4), value);
 }
 
+static inline void x2apic_write_reg_fault(unsigned int reg, uint64_t value)
+{
+	uint8_t fault = x2apic_write_reg_safe(reg, value);
+
+	__GUEST_ASSERT(fault == GP_VECTOR,
+		       "Wanted #GP on WRMSR(%x) = %lx, got 0x%x\n",
+		       APIC_BASE_MSR + (reg >> 4), value, fault);
+}
+
+
 #endif /* SELFTEST_KVM_APIC_H */
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 07/10] KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (5 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 06/10] KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 08/10] KVM: selftests: Test x2APIC ICR reserved bits Sean Christopherson
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Don't test the ICR BUSY bit when x2APIC is enabled as AMD and Intel have
different behavior (AMD #GPs, Intel ignores), and the fact that the CPU
performs the reserved bit checks when IPI virtualization is enabled makes
it impossible for KVM to precisely emulate one or the other.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/x86_64/xapic_state_test.c    | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
index 69849acd95b0..928d65948c48 100644
--- a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
@@ -70,12 +70,10 @@ static void ____test_icr(struct xapic_vcpu *x, uint64_t val)
 	vcpu_ioctl(vcpu, KVM_GET_LAPIC, &xapic);
 	icr = (u64)(*((u32 *)&xapic.regs[APIC_ICR])) |
 	      (u64)(*((u32 *)&xapic.regs[APIC_ICR2])) << 32;
-	if (!x->is_x2apic) {
+	if (!x->is_x2apic)
 		val &= (-1u | (0xffull << (32 + 24)));
-		TEST_ASSERT_EQ(icr, val & ~APIC_ICR_BUSY);
-	} else {
-		TEST_ASSERT_EQ(icr & ~APIC_ICR_BUSY, val & ~APIC_ICR_BUSY);
-	}
+
+	TEST_ASSERT_EQ(icr, val & ~APIC_ICR_BUSY);
 }
 
 #define X2APIC_RSVED_BITS_MASK  (GENMASK_ULL(31,20) | \
@@ -91,7 +89,15 @@ static void __test_icr(struct xapic_vcpu *x, uint64_t val)
 		 */
 		val &= ~X2APIC_RSVED_BITS_MASK;
 	}
-	____test_icr(x, val | APIC_ICR_BUSY);
+
+	/*
+	 * The BUSY bit is reserved on both AMD and Intel, but only AMD treats
+	 * it is as _must_ be zero.  Intel simply ignores the bit.  Don't test
+	 * the BUSY bit for x2APIC, as there is no single correct behavior.
+	 */
+	if (!x->is_x2apic)
+		____test_icr(x, val | APIC_ICR_BUSY);
+
 	____test_icr(x, val & ~(u64)APIC_ICR_BUSY);
 }
 
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 08/10] KVM: selftests: Test x2APIC ICR reserved bits
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (6 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 07/10] KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 09/10] KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote Sean Christopherson
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Actually test x2APIC ICR reserved bits instead of deliberately skipping
them.  The behavior that is observed when IPI virtualization is enabled is
the architecturally correct behavior, KVM is the one who was wrong, i.e.
KVM was missing reserved bit checks.

Fixes: 4b88b1a518b3 ("KVM: selftests: Enhance handling WRMSR ICR register in x2APIC mode")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/x86_64/xapic_state_test.c   | 23 ++++++++-----------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
index 928d65948c48..d701fe9dd686 100644
--- a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
@@ -31,6 +31,10 @@ static void xapic_guest_code(void)
 	}
 }
 
+#define X2APIC_RSVD_BITS_MASK  (GENMASK_ULL(31, 20) | \
+				GENMASK_ULL(17, 16) | \
+				GENMASK_ULL(13, 13))
+
 static void x2apic_guest_code(void)
 {
 	asm volatile("cli");
@@ -41,7 +45,10 @@ static void x2apic_guest_code(void)
 		uint64_t val = x2apic_read_reg(APIC_IRR) |
 			       x2apic_read_reg(APIC_IRR + 0x10) << 32;
 
-		x2apic_write_reg(APIC_ICR, val);
+		if (val & X2APIC_RSVD_BITS_MASK)
+			x2apic_write_reg_fault(APIC_ICR, val);
+		else
+			x2apic_write_reg(APIC_ICR, val);
 		GUEST_SYNC(val);
 	} while (1);
 }
@@ -72,24 +79,14 @@ static void ____test_icr(struct xapic_vcpu *x, uint64_t val)
 	      (u64)(*((u32 *)&xapic.regs[APIC_ICR2])) << 32;
 	if (!x->is_x2apic)
 		val &= (-1u | (0xffull << (32 + 24)));
+	else if (val & X2APIC_RSVD_BITS_MASK)
+		return;
 
 	TEST_ASSERT_EQ(icr, val & ~APIC_ICR_BUSY);
 }
 
-#define X2APIC_RSVED_BITS_MASK  (GENMASK_ULL(31,20) | \
-				 GENMASK_ULL(17,16) | \
-				 GENMASK_ULL(13,13))
-
 static void __test_icr(struct xapic_vcpu *x, uint64_t val)
 {
-	if (x->is_x2apic) {
-		/* Hardware writing vICR register requires reserved bits 31:20,
-		 * 17:16 and 13 kept as zero to avoid #GP exception. Data value
-		 * written to vICR should mask out those bits above.
-		 */
-		val &= ~X2APIC_RSVED_BITS_MASK;
-	}
-
 	/*
 	 * The BUSY bit is reserved on both AMD and Intel, but only AMD treats
 	 * it is as _must_ be zero.  Intel simply ignores the bit.  Don't test
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 09/10] KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (7 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 08/10] KVM: selftests: Test x2APIC ICR reserved bits Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-07-19 23:51 ` [PATCH v2 10/10] KVM: selftests: Play nice with AMD's AVIC errata Sean Christopherson
  2024-08-31  0:20 ` [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

Now that the BUSY bit mess is gone (for x2APIC), verify that the *guest*
can read back the ICR value that it wrote.  Due to the divergent
behavior between AMD and Intel with respect to the backing storage of the
ICR in the vAPIC page, emulating a seemingly simple MSR write is quite
complex.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/x86_64/xapic_state_test.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
index d701fe9dd686..a940adf429ef 100644
--- a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
@@ -45,10 +45,12 @@ static void x2apic_guest_code(void)
 		uint64_t val = x2apic_read_reg(APIC_IRR) |
 			       x2apic_read_reg(APIC_IRR + 0x10) << 32;
 
-		if (val & X2APIC_RSVD_BITS_MASK)
+		if (val & X2APIC_RSVD_BITS_MASK) {
 			x2apic_write_reg_fault(APIC_ICR, val);
-		else
+		} else {
 			x2apic_write_reg(APIC_ICR, val);
+			GUEST_ASSERT_EQ(x2apic_read_reg(APIC_ICR), val);
+		}
 		GUEST_SYNC(val);
 	} while (1);
 }
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 10/10] KVM: selftests: Play nice with AMD's AVIC errata
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (8 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 09/10] KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote Sean Christopherson
@ 2024-07-19 23:51 ` Sean Christopherson
  2024-08-31  0:20 ` [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-07-19 23:51 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

When AVIC, and thus IPI virtualization on AMD, is enabled, the CPU will
virtualize ICR writes.  Unfortunately, the CPU doesn't do a very good job,
as it fails to clear the BUSY bit and also allows writing ICR2[23:0],
despite them being "RESERVED MBZ".  Account for the quirky behavior in
the xapic_state test to avoid failures in a configuration that likely has
no hope of ever being enabled in production.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 .../selftests/kvm/x86_64/xapic_state_test.c   | 23 +++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
index a940adf429ef..a72bdc4c5c52 100644
--- a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c
@@ -13,6 +13,7 @@
 struct xapic_vcpu {
 	struct kvm_vcpu *vcpu;
 	bool is_x2apic;
+	bool has_xavic_errata;
 };
 
 static void xapic_guest_code(void)
@@ -79,12 +80,17 @@ static void ____test_icr(struct xapic_vcpu *x, uint64_t val)
 	vcpu_ioctl(vcpu, KVM_GET_LAPIC, &xapic);
 	icr = (u64)(*((u32 *)&xapic.regs[APIC_ICR])) |
 	      (u64)(*((u32 *)&xapic.regs[APIC_ICR2])) << 32;
-	if (!x->is_x2apic)
-		val &= (-1u | (0xffull << (32 + 24)));
-	else if (val & X2APIC_RSVD_BITS_MASK)
+	if (!x->is_x2apic) {
+		if (!x->has_xavic_errata)
+			val &= (-1u | (0xffull << (32 + 24)));
+	} else if (val & X2APIC_RSVD_BITS_MASK) {
 		return;
+	}
 
-	TEST_ASSERT_EQ(icr, val & ~APIC_ICR_BUSY);
+	if (x->has_xavic_errata)
+		TEST_ASSERT_EQ(icr & ~APIC_ICR_BUSY, val & ~APIC_ICR_BUSY);
+	else
+		TEST_ASSERT_EQ(icr, val & ~APIC_ICR_BUSY);
 }
 
 static void __test_icr(struct xapic_vcpu *x, uint64_t val)
@@ -209,6 +215,15 @@ int main(int argc, char *argv[])
 	vm = vm_create_with_one_vcpu(&x.vcpu, xapic_guest_code);
 	x.is_x2apic = false;
 
+	/*
+	 * AMD's AVIC implementation is buggy (fails to clear the ICR BUSY bit),
+	 * and also diverges from KVM with respect to ICR2[23:0] (KVM and Intel
+	 * drops writes, AMD does not).  Account for the errata when checking
+	 * that KVM reads back what was written.
+	 */
+	x.has_xavic_errata = host_cpu_is_amd &&
+			     get_kvm_amd_param_bool("avic");
+
 	vcpu_clear_cpuid_feature(x.vcpu, X86_FEATURE_X2APIC);
 
 	virt_pg_map(vm, APIC_DEFAULT_GPA, APIC_DEFAULT_GPA);
-- 
2.45.2.1089.g2a221341d9-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  2024-07-19 23:51 ` [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC) Sean Christopherson
@ 2024-08-22 18:44   ` Sean Christopherson
  2024-08-22 19:16     ` Tom Lendacky
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-08-22 18:44 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Suravee Suthikulpanit
  Cc: Tom Lendacky

+Tom

Can someone from AMD confirm that this is indeed the behavior, and that for AMD
CPUs, it's the architectural behavior?

A sanity check on the code would be appreciated too, it'd be nice to get this
into v6.12.

Thanks!

On Fri, Jul 19, 2024, Sean Christopherson wrote:
> Re-introduce the "split" x2APIC ICR storage that KVM used prior to Intel's
> IPI virtualization support, but only for AMD.  While not stated anywhere
> in the APM, despite stating the ICR is a single 64-bit register, AMD CPUs
> store the 64-bit ICR as two separate 32-bit values in ICR and ICR2.  When
> IPI virtualization (IPIv on Intel, all AVIC flavors on AMD) is enabled,
> KVM needs to match CPU behavior as some ICR ICR writes will be handled by
> the CPU, not by KVM.
> 
> Add a kvm_x86_ops knob to control the underlying format used by the CPU to
> store the x2APIC ICR, and tune it to AMD vs. Intel regardless of whether
> or not x2AVIC is enabled.  If KVM is handling all ICR writes, the storage
> format for x2APIC mode doesn't matter, and having the behavior follow AMD
> versus Intel will provide better test coverage and ease debugging.
> 
> Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
> Cc: stable@vger.kernel.org
> Cc: Maxim Levitsky <mlevitsk@redhat.com>
> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/lapic.c            | 42 +++++++++++++++++++++++----------
>  arch/x86/kvm/svm/svm.c          |  2 ++
>  arch/x86/kvm/vmx/main.c         |  2 ++
>  4 files changed, 36 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 950a03e0181e..edc235521434 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1726,6 +1726,8 @@ struct kvm_x86_ops {
>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> +
> +	const bool x2apic_icr_is_split;
>  	const unsigned long required_apicv_inhibits;
>  	bool allow_apicv_in_x2apic_without_x2apic_virtualization;
>  	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index d14ef485b0bd..cc0a1008fae4 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2473,11 +2473,25 @@ int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
>  	data &= ~APIC_ICR_BUSY;
>  
>  	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
> -	kvm_lapic_set_reg64(apic, APIC_ICR, data);
> +	if (kvm_x86_ops.x2apic_icr_is_split) {
> +		kvm_lapic_set_reg(apic, APIC_ICR, data);
> +		kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32);
> +	} else {
> +		kvm_lapic_set_reg64(apic, APIC_ICR, data);
> +	}
>  	trace_kvm_apic_write(APIC_ICR, data);
>  	return 0;
>  }
>  
> +static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic)
> +{
> +	if (kvm_x86_ops.x2apic_icr_is_split)
> +		return (u64)kvm_lapic_get_reg(apic, APIC_ICR) |
> +		       (u64)kvm_lapic_get_reg(apic, APIC_ICR2) << 32;
> +
> +	return kvm_lapic_get_reg64(apic, APIC_ICR);
> +}
> +
>  /* emulate APIC access in a trap manner */
>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>  {
> @@ -2495,7 +2509,7 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>  	 * maybe-unecessary write, and both are in the noise anyways.
>  	 */
>  	if (apic_x2apic_mode(apic) && offset == APIC_ICR)
> -		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR)));
> +		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_x2apic_icr_read(apic)));
>  	else
>  		kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
>  }
> @@ -3005,18 +3019,22 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  
>  		/*
>  		 * In x2APIC mode, the LDR is fixed and based on the id.  And
> -		 * ICR is internally a single 64-bit register, but needs to be
> -		 * split to ICR+ICR2 in userspace for backwards compatibility.
> +		 * if the ICR is _not_ split, ICR is internally a single 64-bit
> +		 * register, but needs to be split to ICR+ICR2 in userspace for
> +		 * backwards compatibility.
>  		 */
> -		if (set) {
> +		if (set)
>  			*ldr = kvm_apic_calc_x2apic_ldr(*id);
>  
> -			icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
> -			      (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
> -			__kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
> -		} else {
> -			icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
> -			__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
> +		if (!kvm_x86_ops.x2apic_icr_is_split) {
> +			if (set) {
> +				icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
> +				      (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
> +				__kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
> +			} else {
> +				icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
> +				__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
> +			}
>  		}
>  	}
>  
> @@ -3214,7 +3232,7 @@ static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
>  	u32 low;
>  
>  	if (reg == APIC_ICR) {
> -		*data = kvm_lapic_get_reg64(apic, APIC_ICR);
> +		*data = kvm_x2apic_icr_read(apic);
>  		return 0;
>  	}
>  
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index c115d26844f7..04c113386de6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -5049,6 +5049,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>  	.enable_nmi_window = svm_enable_nmi_window,
>  	.enable_irq_window = svm_enable_irq_window,
>  	.update_cr8_intercept = svm_update_cr8_intercept,
> +
> +	.x2apic_icr_is_split = true,
>  	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
>  	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
>  	.apicv_post_state_restore = avic_apicv_post_state_restore,
> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
> index 0bf35ebe8a1b..a70699665e11 100644
> --- a/arch/x86/kvm/vmx/main.c
> +++ b/arch/x86/kvm/vmx/main.c
> @@ -89,6 +89,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>  	.enable_nmi_window = vmx_enable_nmi_window,
>  	.enable_irq_window = vmx_enable_irq_window,
>  	.update_cr8_intercept = vmx_update_cr8_intercept,
> +
> +	.x2apic_icr_is_split = false,
>  	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
>  	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
>  	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
> -- 
> 2.45.2.1089.g2a221341d9-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  2024-08-22 18:44   ` Sean Christopherson
@ 2024-08-22 19:16     ` Tom Lendacky
  2024-08-22 19:41       ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Lendacky @ 2024-08-22 19:16 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini, kvm, linux-kernel,
	Maxim Levitsky, Suravee Suthikulpanit

On 8/22/24 13:44, Sean Christopherson wrote:
> +Tom
> 
> Can someone from AMD confirm that this is indeed the behavior, and that for AMD
> CPUs, it's the architectural behavior?

In section "16.11 Accessing x2APIC Register" of APM Vol 2, there is this
statement:

"For 64-bit x2APIC registers, the high-order bits (bits 63:32) are
mapped to EDX[31:0]"

and in section "16.11.1 x2APIC Register Address Space" of APM Vol 2,
there is this statement:

"The two 32-bit Interrupt Command Registers in APIC mode (MMIO offsets
300h and 310h) are merged into a single 64-bit x2APIC register at MSR
address 830h."

So I believe this isn't necessary. @Suravee, agree?

Are you seeing a bug related to this?

Thanks,
Tom

> 
> A sanity check on the code would be appreciated too, it'd be nice to get this
> into v6.12.
> 
> Thanks!
> 
> On Fri, Jul 19, 2024, Sean Christopherson wrote:
>> Re-introduce the "split" x2APIC ICR storage that KVM used prior to Intel's
>> IPI virtualization support, but only for AMD.  While not stated anywhere
>> in the APM, despite stating the ICR is a single 64-bit register, AMD CPUs
>> store the 64-bit ICR as two separate 32-bit values in ICR and ICR2.  When
>> IPI virtualization (IPIv on Intel, all AVIC flavors on AMD) is enabled,
>> KVM needs to match CPU behavior as some ICR ICR writes will be handled by
>> the CPU, not by KVM.
>>
>> Add a kvm_x86_ops knob to control the underlying format used by the CPU to
>> store the x2APIC ICR, and tune it to AMD vs. Intel regardless of whether
>> or not x2AVIC is enabled.  If KVM is handling all ICR writes, the storage
>> format for x2APIC mode doesn't matter, and having the behavior follow AMD
>> versus Intel will provide better test coverage and ease debugging.
>>
>> Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
>> Cc: stable@vger.kernel.org
>> Cc: Maxim Levitsky <mlevitsk@redhat.com>
>> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>  arch/x86/include/asm/kvm_host.h |  2 ++
>>  arch/x86/kvm/lapic.c            | 42 +++++++++++++++++++++++----------
>>  arch/x86/kvm/svm/svm.c          |  2 ++
>>  arch/x86/kvm/vmx/main.c         |  2 ++
>>  4 files changed, 36 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 950a03e0181e..edc235521434 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1726,6 +1726,8 @@ struct kvm_x86_ops {
>>  	void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
>>  	void (*enable_irq_window)(struct kvm_vcpu *vcpu);
>>  	void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
>> +
>> +	const bool x2apic_icr_is_split;
>>  	const unsigned long required_apicv_inhibits;
>>  	bool allow_apicv_in_x2apic_without_x2apic_virtualization;
>>  	void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index d14ef485b0bd..cc0a1008fae4 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -2473,11 +2473,25 @@ int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
>>  	data &= ~APIC_ICR_BUSY;
>>  
>>  	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
>> -	kvm_lapic_set_reg64(apic, APIC_ICR, data);
>> +	if (kvm_x86_ops.x2apic_icr_is_split) {
>> +		kvm_lapic_set_reg(apic, APIC_ICR, data);
>> +		kvm_lapic_set_reg(apic, APIC_ICR2, data >> 32);
>> +	} else {
>> +		kvm_lapic_set_reg64(apic, APIC_ICR, data);
>> +	}
>>  	trace_kvm_apic_write(APIC_ICR, data);
>>  	return 0;
>>  }
>>  
>> +static u64 kvm_x2apic_icr_read(struct kvm_lapic *apic)
>> +{
>> +	if (kvm_x86_ops.x2apic_icr_is_split)
>> +		return (u64)kvm_lapic_get_reg(apic, APIC_ICR) |
>> +		       (u64)kvm_lapic_get_reg(apic, APIC_ICR2) << 32;
>> +
>> +	return kvm_lapic_get_reg64(apic, APIC_ICR);
>> +}
>> +
>>  /* emulate APIC access in a trap manner */
>>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>>  {
>> @@ -2495,7 +2509,7 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>>  	 * maybe-unecessary write, and both are in the noise anyways.
>>  	 */
>>  	if (apic_x2apic_mode(apic) && offset == APIC_ICR)
>> -		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_lapic_get_reg64(apic, APIC_ICR)));
>> +		WARN_ON_ONCE(kvm_x2apic_icr_write(apic, kvm_x2apic_icr_read(apic)));
>>  	else
>>  		kvm_lapic_reg_write(apic, offset, kvm_lapic_get_reg(apic, offset));
>>  }
>> @@ -3005,18 +3019,22 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>>  
>>  		/*
>>  		 * In x2APIC mode, the LDR is fixed and based on the id.  And
>> -		 * ICR is internally a single 64-bit register, but needs to be
>> -		 * split to ICR+ICR2 in userspace for backwards compatibility.
>> +		 * if the ICR is _not_ split, ICR is internally a single 64-bit
>> +		 * register, but needs to be split to ICR+ICR2 in userspace for
>> +		 * backwards compatibility.
>>  		 */
>> -		if (set) {
>> +		if (set)
>>  			*ldr = kvm_apic_calc_x2apic_ldr(*id);
>>  
>> -			icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
>> -			      (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
>> -			__kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
>> -		} else {
>> -			icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
>> -			__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
>> +		if (!kvm_x86_ops.x2apic_icr_is_split) {
>> +			if (set) {
>> +				icr = __kvm_lapic_get_reg(s->regs, APIC_ICR) |
>> +				      (u64)__kvm_lapic_get_reg(s->regs, APIC_ICR2) << 32;
>> +				__kvm_lapic_set_reg64(s->regs, APIC_ICR, icr);
>> +			} else {
>> +				icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
>> +				__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
>> +			}
>>  		}
>>  	}
>>  
>> @@ -3214,7 +3232,7 @@ static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
>>  	u32 low;
>>  
>>  	if (reg == APIC_ICR) {
>> -		*data = kvm_lapic_get_reg64(apic, APIC_ICR);
>> +		*data = kvm_x2apic_icr_read(apic);
>>  		return 0;
>>  	}
>>  
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index c115d26844f7..04c113386de6 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -5049,6 +5049,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
>>  	.enable_nmi_window = svm_enable_nmi_window,
>>  	.enable_irq_window = svm_enable_irq_window,
>>  	.update_cr8_intercept = svm_update_cr8_intercept,
>> +
>> +	.x2apic_icr_is_split = true,
>>  	.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
>>  	.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
>>  	.apicv_post_state_restore = avic_apicv_post_state_restore,
>> diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
>> index 0bf35ebe8a1b..a70699665e11 100644
>> --- a/arch/x86/kvm/vmx/main.c
>> +++ b/arch/x86/kvm/vmx/main.c
>> @@ -89,6 +89,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
>>  	.enable_nmi_window = vmx_enable_nmi_window,
>>  	.enable_irq_window = vmx_enable_irq_window,
>>  	.update_cr8_intercept = vmx_update_cr8_intercept,
>> +
>> +	.x2apic_icr_is_split = false,
>>  	.set_virtual_apic_mode = vmx_set_virtual_apic_mode,
>>  	.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
>>  	.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
>> -- 
>> 2.45.2.1089.g2a221341d9-goog
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  2024-08-22 19:16     ` Tom Lendacky
@ 2024-08-22 19:41       ` Sean Christopherson
  2024-08-22 20:29         ` Tom Lendacky
  0 siblings, 1 reply; 18+ messages in thread
From: Sean Christopherson @ 2024-08-22 19:41 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Suravee Suthikulpanit

On Thu, Aug 22, 2024, Tom Lendacky wrote:
> On 8/22/24 13:44, Sean Christopherson wrote:
> > +Tom
> > 
> > Can someone from AMD confirm that this is indeed the behavior, and that for AMD
> > CPUs, it's the architectural behavior?
> 
> In section "16.11 Accessing x2APIC Register" of APM Vol 2, there is this
> statement:
> 
> "For 64-bit x2APIC registers, the high-order bits (bits 63:32) are
> mapped to EDX[31:0]"
> 
> and in section "16.11.1 x2APIC Register Address Space" of APM Vol 2,
> there is this statement:
> 
> "The two 32-bit Interrupt Command Registers in APIC mode (MMIO offsets
> 300h and 310h) are merged into a single 64-bit x2APIC register at MSR
> address 830h."
> 
> So I believe this isn't necessary. @Suravee, agree?
> 
> Are you seeing a bug related to this?

Yep.  With APICv and x2APIC enabled, Intel CPUs use a single 64-bit value at
offset 300h for the backing storage.  This is what KVM currently implements,
e.g. when pulling state out of the vAPIC page for migration, and when emulating
a RDMSR(ICR).

With AVIC and x2APIC (a.k.a. x2AVIC enabled), Genoa uses the legacy MMIO offsets
for storage, at least AFAICT.  I.e. the single MSR at 830h is split into separate
32-bit values at 300h and 310h on WRMSR, and then reconstituted on RDMSR.

The APM doesn't actually clarify the layout of the backing storage, i.e. doesn't
explicitly say that the full 64-bit value is stored at 300h.  IIRC, Intel's SDM
doesn't say that either, i.e. KVM's behavior is fully based on throwing noodles
and seeing what sticks.

FWIW, AMD's behavior actually makes sense, at it provides a consistent layout
when switching between xAPIC and x2APIC.  Intel's does too, from the perspective
of being able to emit single loads/stores.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  2024-08-22 19:41       ` Sean Christopherson
@ 2024-08-22 20:29         ` Tom Lendacky
  2024-08-22 23:10           ` Sean Christopherson
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Lendacky @ 2024-08-22 20:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Suravee Suthikulpanit

On 8/22/24 14:41, Sean Christopherson wrote:
> On Thu, Aug 22, 2024, Tom Lendacky wrote:
>> On 8/22/24 13:44, Sean Christopherson wrote:
>>> +Tom
>>>
>>> Can someone from AMD confirm that this is indeed the behavior, and that for AMD
>>> CPUs, it's the architectural behavior?
>>
>> In section "16.11 Accessing x2APIC Register" of APM Vol 2, there is this
>> statement:
>>
>> "For 64-bit x2APIC registers, the high-order bits (bits 63:32) are
>> mapped to EDX[31:0]"
>>
>> and in section "16.11.1 x2APIC Register Address Space" of APM Vol 2,
>> there is this statement:
>>
>> "The two 32-bit Interrupt Command Registers in APIC mode (MMIO offsets
>> 300h and 310h) are merged into a single 64-bit x2APIC register at MSR
>> address 830h."
>>
>> So I believe this isn't necessary. @Suravee, agree?
>>
>> Are you seeing a bug related to this?
> 
> Yep.  With APICv and x2APIC enabled, Intel CPUs use a single 64-bit value at
> offset 300h for the backing storage.  This is what KVM currently implements,
> e.g. when pulling state out of the vAPIC page for migration, and when emulating
> a RDMSR(ICR).
> 
> With AVIC and x2APIC (a.k.a. x2AVIC enabled), Genoa uses the legacy MMIO offsets
> for storage, at least AFAICT.  I.e. the single MSR at 830h is split into separate
> 32-bit values at 300h and 310h on WRMSR, and then reconstituted on RDMSR.
> 
> The APM doesn't actually clarify the layout of the backing storage, i.e. doesn't
> explicitly say that the full 64-bit value is stored at 300h.  IIRC, Intel's SDM

Ah, for x2AVIC, yes, you have to do two writes to the backing page. One
at offset 0x300 and one at offset 0x310 (confirmed with the hardware
team). The order shouldn't matter since the guest vCPU isn't running
when you're writing the values, but you should do the IRC High write
first, followed by the ICR Low.

Thanks,
Tom

> doesn't say that either, i.e. KVM's behavior is fully based on throwing noodles
> and seeing what sticks.
> 
> FWIW, AMD's behavior actually makes sense, at it provides a consistent layout
> when switching between xAPIC and x2APIC.  Intel's does too, from the perspective
> of being able to emit single loads/stores.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
  2024-08-22 20:29         ` Tom Lendacky
@ 2024-08-22 23:10           ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-08-22 23:10 UTC (permalink / raw)
  To: Tom Lendacky
  Cc: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Suravee Suthikulpanit

On Thu, Aug 22, 2024, Tom Lendacky wrote:
> On 8/22/24 14:41, Sean Christopherson wrote:
> > On Thu, Aug 22, 2024, Tom Lendacky wrote:
> >> On 8/22/24 13:44, Sean Christopherson wrote:
> >>> +Tom
> >>>
> >>> Can someone from AMD confirm that this is indeed the behavior, and that for AMD
> >>> CPUs, it's the architectural behavior?
> >>
> >> In section "16.11 Accessing x2APIC Register" of APM Vol 2, there is this
> >> statement:
> >>
> >> "For 64-bit x2APIC registers, the high-order bits (bits 63:32) are
> >> mapped to EDX[31:0]"
> >>
> >> and in section "16.11.1 x2APIC Register Address Space" of APM Vol 2,
> >> there is this statement:
> >>
> >> "The two 32-bit Interrupt Command Registers in APIC mode (MMIO offsets
> >> 300h and 310h) are merged into a single 64-bit x2APIC register at MSR
> >> address 830h."
> >>
> >> So I believe this isn't necessary. @Suravee, agree?
> >>
> >> Are you seeing a bug related to this?
> > 
> > Yep.  With APICv and x2APIC enabled, Intel CPUs use a single 64-bit value at
> > offset 300h for the backing storage.  This is what KVM currently implements,
> > e.g. when pulling state out of the vAPIC page for migration, and when emulating
> > a RDMSR(ICR).
> > 
> > With AVIC and x2APIC (a.k.a. x2AVIC enabled), Genoa uses the legacy MMIO offsets
> > for storage, at least AFAICT.  I.e. the single MSR at 830h is split into separate
> > 32-bit values at 300h and 310h on WRMSR, and then reconstituted on RDMSR.
> > 
> > The APM doesn't actually clarify the layout of the backing storage, i.e. doesn't
> > explicitly say that the full 64-bit value is stored at 300h.  IIRC, Intel's SDM
> 
> Ah, for x2AVIC, yes, you have to do two writes to the backing page. One
> at offset 0x300 and one at offset 0x310 (confirmed with the hardware
> team). The order shouldn't matter since the guest vCPU isn't running
> when you're writing the values, but you should do the IRC High write
> first, followed by the ICR Low.

Thanks Tom!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active
  2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
                   ` (9 preceding siblings ...)
  2024-07-19 23:51 ` [PATCH v2 10/10] KVM: selftests: Play nice with AMD's AVIC errata Sean Christopherson
@ 2024-08-31  0:20 ` Sean Christopherson
  10 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-08-31  0:20 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Maxim Levitsky, Suravee Suthikulpanit

On Fri, 19 Jul 2024 16:50:57 -0700, Sean Christopherson wrote:
> I made the mistake of expanding my testing to run with and without AVIC
> enabled, and to my surprise (wow, sarcasm), x2AVIC failed hard on the
> xapic_state_test due to ICR issues.
> 
> AFAICT, the issue is that AMD splits the 64-bit ICR into the legacy ICR
> and ICR2 fields when storing the ICR in the vAPIC (apparently "it's a
> single 64-bit register" is open to intepretation).  Aside from causing
> the selftest failure and potential live migration issues, botching the
> format is quite bad, as KVM will mishandle incomplete virtualized IPIs,
> e.g. generate IRQs to the wrong vCPU, drop IRQs, etc.
> 
> [...]

Applied to kvm-x86 misc, thanks!

[01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits
        https://github.com/kvm-x86/linux/commit/71bf395a276f
[02/10] KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode()
        https://github.com/kvm-x86/linux/commit/d33234342f8b
[03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
        https://github.com/kvm-x86/linux/commit/73b42dc69be8
[04/10] KVM: selftests: Open code vcpu_run() equivalent in guest_printf test
        https://github.com/kvm-x86/linux/commit/d1c2cdca5a08
[05/10] KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts
        https://github.com/kvm-x86/linux/commit/ed24ba6c2c34
[06/10] KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs
        https://github.com/kvm-x86/linux/commit/f2e91e874179
[07/10] KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled
        https://github.com/kvm-x86/linux/commit/faf06a238254
[08/10] KVM: selftests: Test x2APIC ICR reserved bits
        https://github.com/kvm-x86/linux/commit/3426cb48adb4
[09/10] KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote
        https://github.com/kvm-x86/linux/commit/0cb26ec32085
[10/10] KVM: selftests: Play nice with AMD's AVIC errata
        https://github.com/kvm-x86/linux/commit/5a7c7d148e48

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits
  2024-07-19 23:50 ` [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits Sean Christopherson
@ 2024-11-01 18:34   ` Sean Christopherson
  0 siblings, 0 replies; 18+ messages in thread
From: Sean Christopherson @ 2024-11-01 18:34 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, linux-kernel, Maxim Levitsky,
	Suravee Suthikulpanit

On Fri, Jul 19, 2024, Sean Christopherson wrote:
> Inject a #GP on a WRMSR(ICR) that attempts to set any reserved bits that
> are must-be-zero on both Intel and AMD, i.e. any reserved bits other than
> the BUSY bit, which Intel ignores and basically says is undefined.
> 
> KVM's xapic_state_test selftest has been fudging the bug since commit
> 4b88b1a518b3 ("KVM: selftests: Enhance handling WRMSR ICR register in
> x2APIC mode"), which essentially removed the testcase instead of fixing
> the bug.
> 
> WARN if the nodecode path triggers a #GP, as the CPU is supposed to check
> reserved bits for ICR when it's partially virtualized.

Apparently this isn't accurate, as I've now hit the WARN twice with x2AVIC.  I
haven't debugged in depth, but it's either INVALID_TARGET and INVALID_INT_TYPE.
Which is odd, because the WARN only happens rarely, e.g. appears to be a race of
some form.  But I wouldn't expect those checks to be subject to races.

Ah, but maybe this one is referring to the VALID bit?

  address is not present in the physical or logical ID tables

If that's the case, then (a) ucode is buggy (IMO) and is doing table lookups
*before* reserved bits checks, and (b) I don't see a better option than simply
deleting the WARN.

  ------------[ cut here ]------------
  WARNING: CPU: 146 PID: 274555 at arch/x86/kvm/lapic.c:2521 kvm_apic_write_nodecode+0x7a/0x90 [kvm]
  Modules linked in: kvm_amd kvm ... [last unloaded: kvm]
  CPU: 146 UID: 0 PID: 274555 Comm: qemu Not tainted 6.12.0-smp--41585e8a34cb-sink #458
  Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024
  RIP: 0010:kvm_apic_write_nodecode+0x7a/0x90 [kvm]
  RSP: 0018:ff51c04b4d133be8 EFLAGS: 00010202
  RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000000cffff
  RDX: 0000000087fd0e00 RSI: 00000000000cffff RDI: ff42132c9e336f00
  RBP: ff51c04b4d133e50 R08: 0000000000000000 R09: 0000000000060000
  R10: ffffffffc067428f R11: ffffffffc080aa20 R12: 00000000000cffff
  R13: 0000000000000000 R14: ff42132d09e7c2c0 R15: 0000000000000000
  FS:  00007fc1af0006c0(0000) GS:ff42138a08500000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000006267e52001 CR4: 0000000000771ef0
  PKRU: 00000000
  Call Trace:
   <TASK>
   avic_incomplete_ipi_interception+0x24a/0x4c0 [kvm_amd]
   kvm_arch_vcpu_ioctl_run+0x1e11/0x2720 [kvm]
   kvm_vcpu_ioctl+0x54f/0x630 [kvm]
   __se_sys_ioctl+0x6b/0xc0
   do_syscall_64+0x83/0x160
   entry_SYSCALL_64_after_hwframe+0x76/0x7e
  RIP: 0033:0x7fc1b584624b
   </TASK>
  ---[ end trace 0000000000000000 ]---

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-11-01 18:34 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-19 23:50 [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson
2024-07-19 23:50 ` [PATCH v2 01/10] KVM: x86: Enforce x2APIC's must-be-zero reserved ICR bits Sean Christopherson
2024-11-01 18:34   ` Sean Christopherson
2024-07-19 23:50 ` [PATCH v2 02/10] KVM: x86: Move x2APIC ICR helper above kvm_apic_write_nodecode() Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 03/10] KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC) Sean Christopherson
2024-08-22 18:44   ` Sean Christopherson
2024-08-22 19:16     ` Tom Lendacky
2024-08-22 19:41       ` Sean Christopherson
2024-08-22 20:29         ` Tom Lendacky
2024-08-22 23:10           ` Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 04/10] KVM: selftests: Open code vcpu_run() equivalent in guest_printf test Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 05/10] KVM: selftests: Report unhandled exceptions on x86 as regular guest asserts Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 06/10] KVM: selftests: Add x86 helpers to play nice with x2APIC MSR #GPs Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 07/10] KVM: selftests: Skip ICR.BUSY test in xapic_state_test if x2APIC is enabled Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 08/10] KVM: selftests: Test x2APIC ICR reserved bits Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 09/10] KVM: selftests: Verify the guest can read back the x2APIC ICR it wrote Sean Christopherson
2024-07-19 23:51 ` [PATCH v2 10/10] KVM: selftests: Play nice with AMD's AVIC errata Sean Christopherson
2024-08-31  0:20 ` [PATCH v2 00/10] KVM: x86: Fix ICR handling when x2AVIC is active Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).