public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/3] Add support for the Idle HLT intercept feature
@ 2025-01-28 12:48 Manali Shukla
  2025-01-28 12:48 ` [PATCH v6 1/3] x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept Manali Shukla
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Manali Shukla @ 2025-01-28 12:48 UTC (permalink / raw)
  To: kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets,
	manali.shukla, bp, babu.moger, neeraj.upadhyay

The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.

The Idle HLT intercept feature is used for enlightened guests who wish
to securely handle the events. When an enlightened guest does a HLT
while an interrupt is pending, hypervisor will not have a way to
figure out whether the guest needs to be re-entered or not. The Idle
HLT intercept feature allows the HLT execution only if there are no
pending V_INTR and V_NMI events.

Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].

Document for the Idle HLT intercept feature is available at [1].

This series is based on kvm-x86/next (eb723766b103) + [2].

Testing Done:
- Tested the functionality for the Idle HLT intercept feature
  using selftest ipi_hlt_test.
- Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept
  functionality.
- Tested the Idle HLT intercept functionality on nested guest.

v5 -> v6
- Incorporated Neeraj's review comments on selftest.

v4 -> v5
- Incorporated Sean's review comments on nested Idle HLT intercept support.
- Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.

v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.

v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
  a way that each architecture can share the generic types and also
  provide its own.

v1 -> v2
- Did changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
  doesn't use string to get stat data based on the comments from Sean.
- Added safe_halt() and cli() helpers based on the comments from Sean.

[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
     Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
     https://bugzilla.kernel.org/attachment.cgi?id=306251

[2]: https://lore.kernel.org/kvm/ee027335-f1b9-4637-bc79-27a610c1ab08@amd.com/T/#u

---
V5: https://lore.kernel.org/kvm/20250103081828.7060-1-manali.shukla@amd.com/
V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/
V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/
V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/
V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/

Manali Shukla (3):
  x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
  KVM: SVM: Add Idle HLT intercept support
  KVM: selftests: Add self IPI HLT test

 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/svm.h                    |  1 +
 arch/x86/include/uapi/asm/svm.h               |  2 +
 arch/x86/kvm/svm/svm.c                        | 13 ++-
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../selftests/kvm/include/x86/processor.h     |  1 +
 tools/testing/selftests/kvm/ipi_hlt_test.c    | 81 +++++++++++++++++++
 7 files changed, 97 insertions(+), 3 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c


base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549
prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b
prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b
prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c
prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a
prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f
prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a
prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8
prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v6 1/3] x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
  2025-01-28 12:48 [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
@ 2025-01-28 12:48 ` Manali Shukla
  2025-01-28 12:48 ` [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support Manali Shukla
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Manali Shukla @ 2025-01-28 12:48 UTC (permalink / raw)
  To: kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets,
	manali.shukla, bp, babu.moger, neeraj.upadhyay

From: Manali Shukla <Manali.Shukla@amd.com>

The Idle HLT Intercept feature allows for the HLT instruction
execution by a vCPU to be intercepted by the hypervisor only if there
are no pending events (V_INTR and V_NMI) for the vCPU. When the vCPU
is expected to service the pending events (V_INTR and V_NMI), the Idle
HLT intercept won’t trigger. The feature allows the hypervisor to
determine if the vCPU is idle and reduces wasteful VMEXITs.

In addition to the aforementioned use case, the Idle HLT intercept
feature is also used for enlightened guests who aim to securely manage
events without the hypervisor’s awareness. If a HLT occurs while
a virtual event is pending and the hypervisor is unaware of this
pending event (as could be the case with enlightened guests), the
absence of the Idle HLT intercept feature could result in a vCPU being
suspended indefinitely.

Presence of Idle HLT intercept feature for guests is indicated via CPUID
function 0x8000000A_EDX[30].

Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 645aa360628d..4fef0533f764 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -386,6 +386,7 @@
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* "vnmi" Virtual NMI */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* SVME addr check */
+#define X86_FEATURE_IDLE_HLT		(15*32+30) /* IDLE HLT intercept */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
 #define X86_FEATURE_AVX512VBMI		(16*32+ 1) /* "avx512vbmi" AVX512 Vector Bit Manipulation instructions*/

base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549
prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b
prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b
prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c
prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a
prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f
prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a
prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8
prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-01-28 12:48 [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
  2025-01-28 12:48 ` [PATCH v6 1/3] x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept Manali Shukla
@ 2025-01-28 12:48 ` Manali Shukla
  2025-02-25 22:07   ` Sean Christopherson
  2025-01-28 12:48 ` [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test Manali Shukla
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Manali Shukla @ 2025-01-28 12:48 UTC (permalink / raw)
  To: kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets,
	manali.shukla, bp, babu.moger, neeraj.upadhyay

From: Manali Shukla <Manali.Shukla@amd.com>

The hypervisor can intercept the HLT instruction by setting the
HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if
there are pending V_INTR and V_NMI events, as the hypervisor must then
initiate a VMRUN to handle them.

If the HLT-Intercept Bit is cleared and the vCPU executes HLT while
there are pending V_INTR and V_NMI events, the hypervisor won’t detect
them, potentially causing indefinite suspension of the vCPU. This poses
a problem for enlightened guests who  wish to securely handle the
events.

For Secure AVIC scenarios, if a guest does a HLT while an interrupt is
pending (in IRR), the hypervisor does not have a way to figure out
whether the guest needs to be re-entered, as it cannot read the guest
backing page.  The Idle HLT intercept feature allows the hypervisor to
intercept HLT execution only if there are no pending V_INTR and V_NMI
events.

There are two use cases for the Idle HLT intercept feature:
- Secure VMs that wish to handle pending events securely without exiting
  to the hypervisor on HLT (Secure AVIC).
- Optimization for all the VMs to avoid a wasteful VMEXIT during HLT
  when there are pending events.

On discovering the Idle HLT Intercept, the KVM hypervisor,
Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB.
When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0),
offset 0xFh) should be cleared.

Before entering the HLT state, the HLT instruction performs checks in
following order:
- The HLT intercept check, if set, it unconditionally triggers
  SVM_EXIT_HLT (0x78).
- The Idle HLT intercept check, if set and there are no pending V_INTR
  or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).

Details about the Idle HLT intercept feature can be found in AMD APM [1].

[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April
     2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
     https://bugzilla.kernel.org/attachment.cgi?id=306250

Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
---
 arch/x86/include/asm/svm.h      |  1 +
 arch/x86/include/uapi/asm/svm.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 13 ++++++++++---
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 2b59b9951c90..992050cb83d0 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,7 @@ enum {
 	INTERCEPT_INVPCID,
 	INTERCEPT_MCOMMIT,
 	INTERCEPT_TLBSYNC,
+	INTERCEPT_IDLE_HLT = 166,
 };
 
 
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 1814b413fd57..ec1321248dac 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -95,6 +95,7 @@
 #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
 #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
 #define SVM_EXIT_INVPCID       0x0a2
+#define SVM_EXIT_IDLE_HLT      0x0a6
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -224,6 +225,7 @@
 	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
 	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
 	{ SVM_EXIT_INVPCID,     "invpcid" }, \
+	{ SVM_EXIT_IDLE_HLT,     "idle-halt" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a..456d841298f7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1297,8 +1297,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_set_intercept(svm, INTERCEPT_MWAIT);
 	}
 
-	if (!kvm_hlt_in_guest(vcpu->kvm))
-		svm_set_intercept(svm, INTERCEPT_HLT);
+	if (!kvm_hlt_in_guest(vcpu->kvm)) {
+		if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT))
+			svm_set_intercept(svm, INTERCEPT_IDLE_HLT);
+		else
+			svm_set_intercept(svm, INTERCEPT_HLT);
+	}
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
@@ -3342,6 +3346,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_INVPCID]                      = invpcid_interception,
+	[SVM_EXIT_IDLE_HLT]			= kvm_emulate_halt,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -3504,7 +3509,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 		return interrupt_window_interception(vcpu);
 	else if (exit_code == SVM_EXIT_INTR)
 		return intr_interception(vcpu);
-	else if (exit_code == SVM_EXIT_HLT)
+	else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT)
 		return kvm_emulate_halt(vcpu);
 	else if (exit_code == SVM_EXIT_NPF)
 		return npf_interception(vcpu);
@@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
 		if (vnmi)
 			kvm_cpu_cap_set(X86_FEATURE_VNMI);
 
+		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test
  2025-01-28 12:48 [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
  2025-01-28 12:48 ` [PATCH v6 1/3] x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept Manali Shukla
  2025-01-28 12:48 ` [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support Manali Shukla
@ 2025-01-28 12:48 ` Manali Shukla
  2025-01-29  5:26   ` Neeraj Upadhyay
  2025-02-26  0:38   ` Sean Christopherson
  2025-02-10  5:06 ` [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
  2025-02-28 17:06 ` Sean Christopherson
  4 siblings, 2 replies; 16+ messages in thread
From: Manali Shukla @ 2025-01-28 12:48 UTC (permalink / raw)
  To: kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets,
	manali.shukla, bp, babu.moger, neeraj.upadhyay

From: Manali Shukla <Manali.Shukla@amd.com>

The IPI HLT test simulates a scenario where a pending event is present
while the HLT instruction is executed.

This test evaluates the idle HLT intercept feature of the AMD
architecture, if available. If the feature is not present, the test
exercises halt-exits/guest-entry for pending interrupts.  It can be
extended in the future to include cross-vCPU IPI testing.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |  1 +
 .../selftests/kvm/include/x86/processor.h     |  1 +
 tools/testing/selftests/kvm/ipi_hlt_test.c    | 81 +++++++++++++++++++
 3 files changed, 83 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index 4277b983cace..d6eda8c19fed 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -135,6 +135,7 @@ TEST_GEN_PROGS_x86 += steal_time
 TEST_GEN_PROGS_x86 += kvm_binary_stats_test
 TEST_GEN_PROGS_x86 += system_counter_offset_test
 TEST_GEN_PROGS_x86 += pre_fault_memory_test
+TEST_GEN_PROGS_x86 += ipi_hlt_test
 
 # Compiled outputs used by test targets
 TEST_GEN_PROGS_EXTENDED_x86 += x86/nx_huge_pages_test
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index e28c3e462fa7..fe0ed8d33e37 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -200,6 +200,7 @@ struct kvm_x86_cpu_feature {
 #define X86_FEATURE_PAUSEFILTER         KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 10)
 #define X86_FEATURE_PFTHRESHOLD         KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 12)
 #define	X86_FEATURE_VGIF		KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 16)
+#define X86_FEATURE_IDLE_HLT		KVM_X86_CPU_FEATURE(0x8000000A, 0, EDX, 30)
 #define X86_FEATURE_SEV			KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 1)
 #define X86_FEATURE_SEV_ES		KVM_X86_CPU_FEATURE(0x8000001F, 0, EAX, 3)
 #define	X86_FEATURE_PERFMON_V2		KVM_X86_CPU_FEATURE(0x80000022, 0, EAX, 0)
diff --git a/tools/testing/selftests/kvm/ipi_hlt_test.c b/tools/testing/selftests/kvm/ipi_hlt_test.c
new file mode 100644
index 000000000000..449845fa2a82
--- /dev/null
+++ b/tools/testing/selftests/kvm/ipi_hlt_test.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Copyright (C) 2024 Advanced Micro Devices, Inc.
+ *
+ */
+#include <kvm_util.h>
+#include <processor.h>
+#include <test_util.h>
+#include "apic.h"
+
+#define INTR_VECTOR     0x30
+#define NUM_ITERATIONS   1000
+
+static bool irq_received;
+
+/*
+ * The guest code instruments the scenario where there is a V_INTR pending
+ * event available while hlt instruction is executed.
+ */
+
+static void guest_code(void)
+{
+	uint64_t icr_val;
+	int i;
+
+	x2apic_enable();
+
+	icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | INTR_VECTOR);
+
+	for (i = 0; i < NUM_ITERATIONS; i++) {
+		cli();
+		x2apic_write_reg(APIC_ICR, icr_val);
+		safe_halt();
+		GUEST_ASSERT(READ_ONCE(irq_received));
+		WRITE_ONCE(irq_received, false);
+	}
+	GUEST_DONE();
+}
+
+static void guest_intr_handler(struct ex_regs *regs)
+{
+	WRITE_ONCE(irq_received, true);
+	x2apic_write_reg(APIC_EOI, 0x00);
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vm *vm;
+	struct kvm_vcpu *vcpu;
+	struct ucall uc;
+	uint64_t halt_exits;
+
+	TEST_REQUIRE(kvm_has_cap(KVM_CAP_BINARY_STATS_FD));
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+
+	vm_install_exception_handler(vm, INTR_VECTOR, guest_intr_handler);
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO);
+
+	halt_exits = vcpu_get_stat(vcpu, halt_exits);
+
+	switch (get_ucall(vcpu, &uc)) {
+	case UCALL_ABORT:
+		REPORT_GUEST_ASSERT(uc);
+		/* NOT REACHED */
+	case UCALL_DONE:
+		break;
+	default:
+		TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
+	}
+
+	if (kvm_cpu_has(X86_FEATURE_IDLE_HLT))
+		TEST_ASSERT_EQ(halt_exits, 0);
+	else
+		TEST_ASSERT_EQ(halt_exits, NUM_ITERATIONS);
+
+	kvm_vm_free(vm);
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test
  2025-01-28 12:48 ` [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test Manali Shukla
@ 2025-01-29  5:26   ` Neeraj Upadhyay
  2025-02-26  0:38   ` Sean Christopherson
  1 sibling, 0 replies; 16+ messages in thread
From: Neeraj Upadhyay @ 2025-01-29  5:26 UTC (permalink / raw)
  To: Manali Shukla, kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets, bp,
	babu.moger



On 1/28/2025 6:18 PM, Manali Shukla wrote:
> From: Manali Shukla <Manali.Shukla@amd.com>
> 
> The IPI HLT test simulates a scenario where a pending event is present
> while the HLT instruction is executed.
> 
> This test evaluates the idle HLT intercept feature of the AMD
> architecture, if available. If the feature is not present, the test
> exercises halt-exits/guest-entry for pending interrupts.  It can be
> extended in the future to include cross-vCPU IPI testing.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
> ---

Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com>


- Neeraj

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/3] Add support for the Idle HLT intercept feature
  2025-01-28 12:48 [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
                   ` (2 preceding siblings ...)
  2025-01-28 12:48 ` [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test Manali Shukla
@ 2025-02-10  5:06 ` Manali Shukla
  2025-02-17  4:43   ` Manali Shukla
  2025-02-28 17:06 ` Sean Christopherson
  4 siblings, 1 reply; 16+ messages in thread
From: Manali Shukla @ 2025-02-10  5:06 UTC (permalink / raw)
  To: kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets, bp,
	babu.moger, neeraj.upadhyay

On 1/28/2025 6:18 PM, Manali Shukla wrote:
> The upcoming new Idle HLT Intercept feature allows for the HLT
> instruction execution by a vCPU to be intercepted by the hypervisor
> only if there are no pending V_INTR and V_NMI events for the vCPU.
> When the vCPU is expected to service the pending V_INTR and V_NMI
> events, the Idle HLT intercept won’t trigger. The feature allows the
> hypervisor to determine if the vCPU is actually idle and reduces
> wasteful VMEXITs.
> 
> The Idle HLT intercept feature is used for enlightened guests who wish
> to securely handle the events. When an enlightened guest does a HLT
> while an interrupt is pending, hypervisor will not have a way to
> figure out whether the guest needs to be re-entered or not. The Idle
> HLT intercept feature allows the HLT execution only if there are no
> pending V_INTR and V_NMI events.
> 
> Presence of the Idle HLT Intercept feature is indicated via CPUID
> function Fn8000_000A_EDX[30].
> 
> Document for the Idle HLT intercept feature is available at [1].
> 
> This series is based on kvm-x86/next (eb723766b103) + [2].
> 
> Testing Done:
> - Tested the functionality for the Idle HLT intercept feature
>   using selftest ipi_hlt_test.
> - Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept
>   functionality.
> - Tested the Idle HLT intercept functionality on nested guest.
> 
> v5 -> v6
> - Incorporated Neeraj's review comments on selftest.
> 
> v4 -> v5
> - Incorporated Sean's review comments on nested Idle HLT intercept support.
> - Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
> 
> v3 -> v4
> - Drop the patches to add vcpu_get_stat() into a new series [2].
> - Added nested Idle HLT intercept support.
> 
> v2 -> v3
> - Incorporated Andrew's suggestion to structure vcpu_stat_types in
>   a way that each architecture can share the generic types and also
>   provide its own.
> 
> v1 -> v2
> - Did changes in svm_idle_hlt_test based on the review comments from Sean.
> - Added an enum based approach to get binary stats in vcpu_get_stat() which
>   doesn't use string to get stat data based on the comments from Sean.
> - Added safe_halt() and cli() helpers based on the comments from Sean.
> 
> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
>      Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
>      https://bugzilla.kernel.org/attachment.cgi?id=306251
> 
> [2]: https://lore.kernel.org/kvm/ee027335-f1b9-4637-bc79-27a610c1ab08@amd.com/T/#u
> 
> ---
> V5: https://lore.kernel.org/kvm/20250103081828.7060-1-manali.shukla@amd.com/
> V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/
> V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/
> V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/
> V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
> 
> Manali Shukla (3):
>   x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
>   KVM: SVM: Add Idle HLT intercept support
>   KVM: selftests: Add self IPI HLT test
> 
>  arch/x86/include/asm/cpufeatures.h            |  1 +
>  arch/x86/include/asm/svm.h                    |  1 +
>  arch/x86/include/uapi/asm/svm.h               |  2 +
>  arch/x86/kvm/svm/svm.c                        | 13 ++-
>  tools/testing/selftests/kvm/Makefile.kvm      |  1 +
>  .../selftests/kvm/include/x86/processor.h     |  1 +
>  tools/testing/selftests/kvm/ipi_hlt_test.c    | 81 +++++++++++++++++++
>  7 files changed, 97 insertions(+), 3 deletions(-)
>  create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
> 
> 
> base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
> prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549
> prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b
> prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b
> prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c
> prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a
> prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f
> prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a
> prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8
> prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a

A gentle reminder for the review.

-Manali



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/3] Add support for the Idle HLT intercept feature
  2025-02-10  5:06 ` [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
@ 2025-02-17  4:43   ` Manali Shukla
  0 siblings, 0 replies; 16+ messages in thread
From: Manali Shukla @ 2025-02-17  4:43 UTC (permalink / raw)
  To: kvm, linux-kselftest
  Cc: pbonzini, seanjc, shuah, nikunj, thomas.lendacky, vkuznets, bp,
	babu.moger, neeraj.upadhyay

On 2/10/2025 10:36 AM, Manali Shukla wrote:
> On 1/28/2025 6:18 PM, Manali Shukla wrote:
>> The upcoming new Idle HLT Intercept feature allows for the HLT
>> instruction execution by a vCPU to be intercepted by the hypervisor
>> only if there are no pending V_INTR and V_NMI events for the vCPU.
>> When the vCPU is expected to service the pending V_INTR and V_NMI
>> events, the Idle HLT intercept won’t trigger. The feature allows the
>> hypervisor to determine if the vCPU is actually idle and reduces
>> wasteful VMEXITs.
>>
>> The Idle HLT intercept feature is used for enlightened guests who wish
>> to securely handle the events. When an enlightened guest does a HLT
>> while an interrupt is pending, hypervisor will not have a way to
>> figure out whether the guest needs to be re-entered or not. The Idle
>> HLT intercept feature allows the HLT execution only if there are no
>> pending V_INTR and V_NMI events.
>>
>> Presence of the Idle HLT Intercept feature is indicated via CPUID
>> function Fn8000_000A_EDX[30].
>>
>> Document for the Idle HLT intercept feature is available at [1].
>>
>> This series is based on kvm-x86/next (eb723766b103) + [2].
>>
>> Testing Done:
>> - Tested the functionality for the Idle HLT intercept feature
>>   using selftest ipi_hlt_test.
>> - Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept
>>   functionality.
>> - Tested the Idle HLT intercept functionality on nested guest.
>>
>> v5 -> v6
>> - Incorporated Neeraj's review comments on selftest.
>>
>> v4 -> v5
>> - Incorporated Sean's review comments on nested Idle HLT intercept support.
>> - Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
>>
>> v3 -> v4
>> - Drop the patches to add vcpu_get_stat() into a new series [2].
>> - Added nested Idle HLT intercept support.
>>
>> v2 -> v3
>> - Incorporated Andrew's suggestion to structure vcpu_stat_types in
>>   a way that each architecture can share the generic types and also
>>   provide its own.
>>
>> v1 -> v2
>> - Did changes in svm_idle_hlt_test based on the review comments from Sean.
>> - Added an enum based approach to get binary stats in vcpu_get_stat() which
>>   doesn't use string to get stat data based on the comments from Sean.
>> - Added safe_halt() and cli() helpers based on the comments from Sean.
>>
>> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
>>      Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
>>      https://bugzilla.kernel.org/attachment.cgi?id=306251
>>
>> [2]: https://lore.kernel.org/kvm/ee027335-f1b9-4637-bc79-27a610c1ab08@amd.com/T/#u
>>
>> ---
>> V5: https://lore.kernel.org/kvm/20250103081828.7060-1-manali.shukla@amd.com/
>> V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/
>> V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/
>> V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/
>> V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
>>
>> Manali Shukla (3):
>>   x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
>>   KVM: SVM: Add Idle HLT intercept support
>>   KVM: selftests: Add self IPI HLT test
>>
>>  arch/x86/include/asm/cpufeatures.h            |  1 +
>>  arch/x86/include/asm/svm.h                    |  1 +
>>  arch/x86/include/uapi/asm/svm.h               |  2 +
>>  arch/x86/kvm/svm/svm.c                        | 13 ++-
>>  tools/testing/selftests/kvm/Makefile.kvm      |  1 +
>>  .../selftests/kvm/include/x86/processor.h     |  1 +
>>  tools/testing/selftests/kvm/ipi_hlt_test.c    | 81 +++++++++++++++++++
>>  7 files changed, 97 insertions(+), 3 deletions(-)
>>  create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
>>
>>
>> base-commit: eb723766b1030a23c38adf2348b7c3d1409d11f0
>> prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549
>> prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b
>> prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b
>> prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c
>> prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a
>> prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f
>> prerequisite-patch-id: b32c21df6522a7396baa41d62bcad9479041d97a
>> prerequisite-patch-id: 0ff4b504e982db7c1dfa8ec6ac485c92a89f4af8
>> prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
> 
> A gentle reminder for the review.
> 
> -Manali
> 
> 

A gentle reminder for the review.

-Manali



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-01-28 12:48 ` [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support Manali Shukla
@ 2025-02-25 22:07   ` Sean Christopherson
  2025-02-27  8:21     ` Manali Shukla
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-25 22:07 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay

On Tue, Jan 28, 2025, Manali Shukla wrote:
> From: Manali Shukla <Manali.Shukla@amd.com>
> 
> The hypervisor can intercept the HLT instruction by setting the
> HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if
> there are pending V_INTR and V_NMI events, as the hypervisor must then
> initiate a VMRUN to handle them.
> 
> If the HLT-Intercept Bit is cleared and the vCPU executes HLT while
> there are pending V_INTR and V_NMI events, the hypervisor won’t detect
> them, potentially causing indefinite suspension of the vCPU. This poses
> a problem for enlightened guests who  wish to securely handle the
> events.
> 
> For Secure AVIC scenarios, if a guest does a HLT while an interrupt is
> pending (in IRR), the hypervisor does not have a way to figure out
> whether the guest needs to be re-entered, as it cannot read the guest
> backing page.  The Idle HLT intercept feature allows the hypervisor to
> intercept HLT execution only if there are no pending V_INTR and V_NMI
> events.
> 
> There are two use cases for the Idle HLT intercept feature:
> - Secure VMs that wish to handle pending events securely without exiting
>   to the hypervisor on HLT (Secure AVIC).

I honestly don't see any reason to talk about Secure AVIC.  It takes a *lot* of
background reading to understand how Secure AVIC actually works with Idle HLT.
Critically, it requires a *reduction* in acceleration relative to what the APM
calls "legacy" AVIC (lol), in that cross-vCPU IPIs are *never* accelerated.
Ignoring device posted interrupts, lack of IPI virtualization means that KVM
always has visibility into *new* interrupts.  The only blind spot is self-IPI and
interrupts that were already made pending by KVM.

> - Optimization for all the VMs to avoid a wasteful VMEXIT during HLT
>   when there are pending events.
> 
> On discovering the Idle HLT Intercept, the KVM hypervisor,
> Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB.
> When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0),
> offset 0xFh) should be cleared.
> 
> Before entering the HLT state, the HLT instruction performs checks in
> following order:
> - The HLT intercept check, if set, it unconditionally triggers
>   SVM_EXIT_HLT (0x78).
> - The Idle HLT intercept check, if set and there are no pending V_INTR
>   or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).
> 
> Details about the Idle HLT intercept feature can be found in AMD APM [1].

This is not a useful changelog.  While I do care about AMD's motivation for adding
Idle HLT, it's far, far more important that the changelog cover (a) what is (and
isn't) being changed in KVM, (b) what edge cases are (or aren't) being handled,
and (c) why the KVM implementation is correct.

> 
> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April
>      2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
>      https://bugzilla.kernel.org/attachment.cgi?id=306250

*sigh*

The inscrutable reference to the APM is longer than the table entry.  Just copy
the table entry.

  When both HLT and Idle HLT intercepts are active at the same time, the HLT
  intercept takes priority. This intercept occurs only if a virtual interrupt
  is not pending (V_INTR or V_NMI).

> Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
> ---
>  arch/x86/include/asm/svm.h      |  1 +
>  arch/x86/include/uapi/asm/svm.h |  2 ++
>  arch/x86/kvm/svm/svm.c          | 13 ++++++++++---
>  3 files changed, 13 insertions(+), 3 deletions(-)

...

> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
>  		if (vnmi)
>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
>  
> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);

I am 99% certain this is wrong.  Or at the very least, severly lacking an
explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
V_NMI.

Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
don't see the equivalent in nSVM.

Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
the right thing, i.e. not hang the vCPU.

Anyways, for now, I think the easiest and best option is to simply skip full nested
support for the moment.

I rewrote the changelog as I was going (I didn't expect to go down so many rabbit
holes), and ended up with the below.  Please review and let me know if I missed
any wrinkles and/or got anything wrong.

---
From 05281b3782b7f880b3afd68e0074bf3abf6d55a7 Mon Sep 17 00:00:00 2001
From: Manali Shukla <Manali.Shukla@amd.com>
Date: Tue, 28 Jan 2025 12:48:11 +0000
Subject: [PATCH] KVM: SVM: Add Idle HLT intercept support

Add support for "Idle HLT" interception on AMD CPUs, and enable Idle HLT
interception instead of "normal" HLT interception for all VMs for which
HLT-exiting is enabled.  Idle HLT provides a mild performance boost for
all VM types, by avoiding a VM-Exit in the scenario where KVM would
immediately "wake" and resume the vCPU.

Idle HLT makes HLT-exiting conditional on the vCPU not having a valid,
unmasked interrupt.  Specifically, a VM-Exit occurs on execution of HLT
if and only if there are no pending V_IRQ or V_NMI events.  Note, Idle
is a replacement for full HLT interception, i.e. enabling HLT interception
would result in all HLT instructions causing unconditional VM-Exits.  Per
the APM:

 When both HLT and Idle HLT intercepts are active at the same time, the
 HLT intercept takes priority. This intercept occurs only if a virtual
 interrupt is not pending (V_INTR or V_NMI).

For KVM's use of V_IRQ (also called V_INTR in the APM) to detect interrupt
windows, the net effect of enabling Idle HLT is that, if a virtual
interupt is pending and unmasked at the time of HLT, the vCPU will take
a V_IRQ intercept instead of a HLT intercept.

When AVIC is enabled, Idle HLT works as intended: the vCPU continues
unimpeded and services the pending virtual interrupt.

Note, the APM's description of V_IRQ interaction with AVIC is quite
confusing, and requires piecing together implied behavior.  Per the APM,
when AVIC is enabled, V_IRQ *from the VMCB* is ignored:

  When AVIC mode is enabled for a virtual processor, the V_IRQ, V_INTR_PRIO,
  V_INTR_VECTOR, and V_IGN_TPR fields in the VMCB are ignored.

Which seems to contradict the behavior of Idle HLT:

  This intercept occurs only if a virtual interrupt is not pending (V_INTR
  or V_NMI).

What's not explicitly stated is that hardware's internal copy of V_IRQ
(and related fields) *are* still active, i.e. are presumably used to cache
information from the virtual APIC.

Handle Idle HLT exits as if they were normal HLT exits, e.g. don't try to
optimize the handling under the assumption that there isn't a pending IRQ.
Irrespective of AVIC, Idle HLT is inherently racy with respect to the vIRR,
as KVM can set vIRR bits asychronously.

No changes are required to support KVM's use Idle HLT while running
L2.  In fact, supporting Idle HLT is actually a bug fix to some extent.
If L1 wants to intercept HLT, recalc_intercepts() will enable HLT
interception in vmcb02 and forward the intercept to L1 as normal.

But if L1 does not want to intercept HLT, then KVM will run L2 with Idle
HLT enabled and HLT interception disabled.  If a V_IRQ or V_NMI for L2
becomes pending and L2 executes HLT, then use of Idle HLT will do the
right thing, i.e. not #VMEXIT and instead deliver the virtual event.  KVM
currently doesn't handle this scenario correctly, e.g. doesn't check V_IRQ
or V_NMI in vmcs02 as part of kvm_vcpu_has_events().

Do not expose Idle HLT to L1 at this time, as supporting nested Idle HLT is
more complex than just enumerating the feature, e.g. requires KVM to handle
the aforementioned scenarios of V_IRQ and V_NMI at the time of exit.

Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
Link: https://bugzilla.kernel.org/attachment.cgi?id=306250
Link: https://lore.kernel.org/r/20250128124812.7324-3-manali.shukla@amd.com
[sean: rewrite changelog, drop nested "support"]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/svm.h      |  1 +
 arch/x86/include/uapi/asm/svm.h |  2 ++
 arch/x86/kvm/svm/svm.c          | 11 ++++++++---
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index e2fac21471f5..12a9dde1e842 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -116,6 +116,7 @@ enum {
 	INTERCEPT_INVPCID,
 	INTERCEPT_MCOMMIT,
 	INTERCEPT_TLBSYNC,
+	INTERCEPT_IDLE_HLT = 166,
 };
 
 
diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index 1814b413fd57..ec1321248dac 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -95,6 +95,7 @@
 #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
 #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
 #define SVM_EXIT_INVPCID       0x0a2
+#define SVM_EXIT_IDLE_HLT      0x0a6
 #define SVM_EXIT_NPF           0x400
 #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
@@ -224,6 +225,7 @@
 	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
 	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
 	{ SVM_EXIT_INVPCID,     "invpcid" }, \
+	{ SVM_EXIT_IDLE_HLT,     "idle-halt" }, \
 	{ SVM_EXIT_NPF,         "npf" }, \
 	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
 	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7640a84e554a..37e83bde8f9f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1297,8 +1297,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 		svm_set_intercept(svm, INTERCEPT_MWAIT);
 	}
 
-	if (!kvm_hlt_in_guest(vcpu->kvm))
-		svm_set_intercept(svm, INTERCEPT_HLT);
+	if (!kvm_hlt_in_guest(vcpu->kvm)) {
+		if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT))
+			svm_set_intercept(svm, INTERCEPT_IDLE_HLT);
+		else
+			svm_set_intercept(svm, INTERCEPT_HLT);
+	}
 
 	control->iopm_base_pa = iopm_base;
 	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
@@ -3342,6 +3346,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
 	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
 	[SVM_EXIT_INVPCID]                      = invpcid_interception,
+	[SVM_EXIT_IDLE_HLT]			= kvm_emulate_halt,
 	[SVM_EXIT_NPF]				= npf_interception,
 	[SVM_EXIT_RSM]                          = rsm_interception,
 	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
@@ -3504,7 +3509,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
 		return interrupt_window_interception(vcpu);
 	else if (exit_code == SVM_EXIT_INTR)
 		return intr_interception(vcpu);
-	else if (exit_code == SVM_EXIT_HLT)
+	else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT)
 		return kvm_emulate_halt(vcpu);
 	else if (exit_code == SVM_EXIT_NPF)
 		return npf_interception(vcpu);

base-commit: b9cd96a7ff9cc9ddf95de59d69afb174a9e90c6e
-- 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test
  2025-01-28 12:48 ` [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test Manali Shukla
  2025-01-29  5:26   ` Neeraj Upadhyay
@ 2025-02-26  0:38   ` Sean Christopherson
  2025-02-28 15:05     ` Manali Shukla
  1 sibling, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-26  0:38 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay

On Tue, Jan 28, 2025, Manali Shukla wrote:
> +	if (kvm_cpu_has(X86_FEATURE_IDLE_HLT))

Well, shoot.  I gave you bad input, and we're stuck.

this_cpu_has() isn't correct, because the part of my previous feedback about
needing to check *KVM* support was 100% correct.  But kvm_cpu_has() isn't right
either, because that checks what KVM supports exposing to the guest, not what
KVM itself supports/uses.  E.g. even if we add full nested support, the test would
fail if nested=0 due to KVM not "supporting" Idle HLT despite using it under the
hood.

The lack of a way for KVM to communicate support to the user has come up in the
past, e.g. in discussion around /proc/cpuinfo.  Sadly, AFAIK there are no (good)
ideas on what that should look like.

For now, I'll just skip this patch, even though doing so makes me quite sad.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-02-25 22:07   ` Sean Christopherson
@ 2025-02-27  8:21     ` Manali Shukla
  2025-02-27 14:32       ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Manali Shukla @ 2025-02-27  8:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay, Manali Shukla

Hi Sean,

Thank you for reviewing my patches.

On 2/26/2025 3:37 AM, Sean Christopherson wrote:
> On Tue, Jan 28, 2025, Manali Shukla wrote:
>> From: Manali Shukla <Manali.Shukla@amd.com>
>>
>> The hypervisor can intercept the HLT instruction by setting the
>> HLT-Intercept Bit in VMCB, causing a VMEXIT. This can be wasteful if
>> there are pending V_INTR and V_NMI events, as the hypervisor must then
>> initiate a VMRUN to handle them.
>>
>> If the HLT-Intercept Bit is cleared and the vCPU executes HLT while
>> there are pending V_INTR and V_NMI events, the hypervisor won’t detect
>> them, potentially causing indefinite suspension of the vCPU. This poses
>> a problem for enlightened guests who  wish to securely handle the
>> events.
>>
>> For Secure AVIC scenarios, if a guest does a HLT while an interrupt is
>> pending (in IRR), the hypervisor does not have a way to figure out
>> whether the guest needs to be re-entered, as it cannot read the guest
>> backing page.  The Idle HLT intercept feature allows the hypervisor to
>> intercept HLT execution only if there are no pending V_INTR and V_NMI
>> events.
>>
>> There are two use cases for the Idle HLT intercept feature:
>> - Secure VMs that wish to handle pending events securely without exiting
>>   to the hypervisor on HLT (Secure AVIC).
> 
> I honestly don't see any reason to talk about Secure AVIC.  It takes a *lot* of
> background reading to understand how Secure AVIC actually works with Idle HLT.
> Critically, it requires a *reduction* in acceleration relative to what the APM
> calls "legacy" AVIC (lol), in that cross-vCPU IPIs are *never* accelerated.
> Ignoring device posted interrupts, lack of IPI virtualization means that KVM
> always has visibility into *new* interrupts.  The only blind spot is self-IPI and
> interrupts that were already made pending by KVM.
> 
>> - Optimization for all the VMs to avoid a wasteful VMEXIT during HLT
>>   when there are pending events.
>>
>> On discovering the Idle HLT Intercept, the KVM hypervisor,
>> Sets the Idle HLT Intercept bit (bit (6), offset 0x14h) in the VMCB.
>> When the Idle HLT Intercept bit is set, HLT Intercept bit (bit (0),
>> offset 0xFh) should be cleared.
>>
>> Before entering the HLT state, the HLT instruction performs checks in
>> following order:
>> - The HLT intercept check, if set, it unconditionally triggers
>>   SVM_EXIT_HLT (0x78).
>> - The Idle HLT intercept check, if set and there are no pending V_INTR
>>   or V_NMI events, triggers SVM_EXIT_IDLE_HLT (0xA6).
>>
>> Details about the Idle HLT intercept feature can be found in AMD APM [1].
> 
> This is not a useful changelog.  While I do care about AMD's motivation for adding
> Idle HLT, it's far, far more important that the changelog cover (a) what is (and
> isn't) being changed in KVM, (b) what edge cases are (or aren't) being handled,
> and (c) why the KVM implementation is correct.
> 
>>
>> [1]: AMD64 Architecture Programmer's Manual Pub. 24593, April
>>      2024, Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
>>      https://bugzilla.kernel.org/attachment.cgi?id=306250
> 
> *sigh*
> 
> The inscrutable reference to the APM is longer than the table entry.  Just copy
> the table entry.
> 
>   When both HLT and Idle HLT intercepts are active at the same time, the HLT
>   intercept takes priority. This intercept occurs only if a virtual interrupt
>   is not pending (V_INTR or V_NMI).
> 
>> Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
>> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
>> ---
>>  arch/x86/include/asm/svm.h      |  1 +
>>  arch/x86/include/uapi/asm/svm.h |  2 ++
>>  arch/x86/kvm/svm/svm.c          | 13 ++++++++++---
>>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> ...
> 
>> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
>>  		if (vnmi)
>>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
>>  
>> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
> 
> I am 99% certain this is wrong.  Or at the very least, severly lacking an
> explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
> then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
> V_NMI.
> 
> Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
> If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
> HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
> prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
> reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
> don't see the equivalent in nSVM.
> 
> Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
> is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
> the right thing, i.e. not hang the vCPU.
> 
> Anyways, for now, I think the easiest and best option is to simply skip full nested
> support for the moment.
> 

Got it, I see the issue you're talking about. I'll need to look into it a bit more to
fully understand it. So yeah, we can hold off on full nested support for idle HLT 
intercept for now.

Since we are planning to disable Idle HLT support on nested guests, should we do
something like this ?

@@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
        if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
                vmcb_clr_intercept(c, INTERCEPT_VMMCALL);

+       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
+               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
+

When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies
the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem 
because the HLT intercept takes priority when it's on. But if the HLT intercept gets 
turned off for some reason, the IDLE HLT intercept will stay on, which is not what we
want.

Other than that, everything looks good to me. 

> I rewrote the changelog as I was going (I didn't expect to go down so many rabbit
> holes), and ended up with the below.  Please review and let me know if I missed
> any wrinkles and/or got anything wrong.
> 
> ---
> From 05281b3782b7f880b3afd68e0074bf3abf6d55a7 Mon Sep 17 00:00:00 2001
> From: Manali Shukla <Manali.Shukla@amd.com>
> Date: Tue, 28 Jan 2025 12:48:11 +0000
> Subject: [PATCH] KVM: SVM: Add Idle HLT intercept support
> 
> Add support for "Idle HLT" interception on AMD CPUs, and enable Idle HLT
> interception instead of "normal" HLT interception for all VMs for which
> HLT-exiting is enabled.  Idle HLT provides a mild performance boost for
> all VM types, by avoiding a VM-Exit in the scenario where KVM would
> immediately "wake" and resume the vCPU.
> 
> Idle HLT makes HLT-exiting conditional on the vCPU not having a valid,
> unmasked interrupt.  Specifically, a VM-Exit occurs on execution of HLT
> if and only if there are no pending V_IRQ or V_NMI events.  Note, Idle
> is a replacement for full HLT interception, i.e. enabling HLT interception
> would result in all HLT instructions causing unconditional VM-Exits.  Per
> the APM:
> 
>  When both HLT and Idle HLT intercepts are active at the same time, the
>  HLT intercept takes priority. This intercept occurs only if a virtual
>  interrupt is not pending (V_INTR or V_NMI).
> 
> For KVM's use of V_IRQ (also called V_INTR in the APM) to detect interrupt
> windows, the net effect of enabling Idle HLT is that, if a virtual
> interupt is pending and unmasked at the time of HLT, the vCPU will take
> a V_IRQ intercept instead of a HLT intercept.
> 
> When AVIC is enabled, Idle HLT works as intended: the vCPU continues
> unimpeded and services the pending virtual interrupt.
> 
> Note, the APM's description of V_IRQ interaction with AVIC is quite
> confusing, and requires piecing together implied behavior.  Per the APM,
> when AVIC is enabled, V_IRQ *from the VMCB* is ignored:
> 
>   When AVIC mode is enabled for a virtual processor, the V_IRQ, V_INTR_PRIO,
>   V_INTR_VECTOR, and V_IGN_TPR fields in the VMCB are ignored.
> 
> Which seems to contradict the behavior of Idle HLT:
> 
>   This intercept occurs only if a virtual interrupt is not pending (V_INTR
>   or V_NMI).
> 
> What's not explicitly stated is that hardware's internal copy of V_IRQ
> (and related fields) *are* still active, i.e. are presumably used to cache
> information from the virtual APIC.
> 
> Handle Idle HLT exits as if they were normal HLT exits, e.g. don't try to
> optimize the handling under the assumption that there isn't a pending IRQ.
> Irrespective of AVIC, Idle HLT is inherently racy with respect to the vIRR,
> as KVM can set vIRR bits asychronously.
> 
> No changes are required to support KVM's use Idle HLT while running
> L2.  In fact, supporting Idle HLT is actually a bug fix to some extent.
> If L1 wants to intercept HLT, recalc_intercepts() will enable HLT
> interception in vmcb02 and forward the intercept to L1 as normal.
> 
> But if L1 does not want to intercept HLT, then KVM will run L2 with Idle
> HLT enabled and HLT interception disabled.  If a V_IRQ or V_NMI for L2
> becomes pending and L2 executes HLT, then use of Idle HLT will do the
> right thing, i.e. not #VMEXIT and instead deliver the virtual event.  KVM
> currently doesn't handle this scenario correctly, e.g. doesn't check V_IRQ
> or V_NMI in vmcs02 as part of kvm_vcpu_has_events().
> 
> Do not expose Idle HLT to L1 at this time, as supporting nested Idle HLT is
> more complex than just enumerating the feature, e.g. requires KVM to handle
> the aforementioned scenarios of V_IRQ and V_NMI at the time of exit.
> 
> Signed-off-by: Manali Shukla <Manali.Shukla@amd.com>
> Reviewed-by: Nikunj A Dadhania <nikunj@amd.com>
> Link: https://bugzilla.kernel.org/attachment.cgi?id=306250
> Link: https://lore.kernel.org/r/20250128124812.7324-3-manali.shukla@amd.com
> [sean: rewrite changelog, drop nested "support"]
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/include/asm/svm.h      |  1 +
>  arch/x86/include/uapi/asm/svm.h |  2 ++
>  arch/x86/kvm/svm/svm.c          | 11 ++++++++---
>  3 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index e2fac21471f5..12a9dde1e842 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -116,6 +116,7 @@ enum {
>  	INTERCEPT_INVPCID,
>  	INTERCEPT_MCOMMIT,
>  	INTERCEPT_TLBSYNC,
> +	INTERCEPT_IDLE_HLT = 166,
>  };
>  
>  
> diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
> index 1814b413fd57..ec1321248dac 100644
> --- a/arch/x86/include/uapi/asm/svm.h
> +++ b/arch/x86/include/uapi/asm/svm.h
> @@ -95,6 +95,7 @@
>  #define SVM_EXIT_CR14_WRITE_TRAP		0x09e
>  #define SVM_EXIT_CR15_WRITE_TRAP		0x09f
>  #define SVM_EXIT_INVPCID       0x0a2
> +#define SVM_EXIT_IDLE_HLT      0x0a6
>  #define SVM_EXIT_NPF           0x400
>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI		0x401
>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS	0x402
> @@ -224,6 +225,7 @@
>  	{ SVM_EXIT_CR4_WRITE_TRAP,	"write_cr4_trap" }, \
>  	{ SVM_EXIT_CR8_WRITE_TRAP,	"write_cr8_trap" }, \
>  	{ SVM_EXIT_INVPCID,     "invpcid" }, \
> +	{ SVM_EXIT_IDLE_HLT,     "idle-halt" }, \
>  	{ SVM_EXIT_NPF,         "npf" }, \
>  	{ SVM_EXIT_AVIC_INCOMPLETE_IPI,		"avic_incomplete_ipi" }, \
>  	{ SVM_EXIT_AVIC_UNACCELERATED_ACCESS,   "avic_unaccelerated_access" }, \
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7640a84e554a..37e83bde8f9f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1297,8 +1297,12 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
>  		svm_set_intercept(svm, INTERCEPT_MWAIT);
>  	}
>  
> -	if (!kvm_hlt_in_guest(vcpu->kvm))
> -		svm_set_intercept(svm, INTERCEPT_HLT);
> +	if (!kvm_hlt_in_guest(vcpu->kvm)) {
> +		if (cpu_feature_enabled(X86_FEATURE_IDLE_HLT))
> +			svm_set_intercept(svm, INTERCEPT_IDLE_HLT);
> +		else
> +			svm_set_intercept(svm, INTERCEPT_HLT);
> +	}
>  
>  	control->iopm_base_pa = iopm_base;
>  	control->msrpm_base_pa = __sme_set(__pa(svm->msrpm));
> @@ -3342,6 +3346,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = {
>  	[SVM_EXIT_CR4_WRITE_TRAP]		= cr_trap,
>  	[SVM_EXIT_CR8_WRITE_TRAP]		= cr_trap,
>  	[SVM_EXIT_INVPCID]                      = invpcid_interception,
> +	[SVM_EXIT_IDLE_HLT]			= kvm_emulate_halt,
>  	[SVM_EXIT_NPF]				= npf_interception,
>  	[SVM_EXIT_RSM]                          = rsm_interception,
>  	[SVM_EXIT_AVIC_INCOMPLETE_IPI]		= avic_incomplete_ipi_interception,
> @@ -3504,7 +3509,7 @@ int svm_invoke_exit_handler(struct kvm_vcpu *vcpu, u64 exit_code)
>  		return interrupt_window_interception(vcpu);
>  	else if (exit_code == SVM_EXIT_INTR)
>  		return intr_interception(vcpu);
> -	else if (exit_code == SVM_EXIT_HLT)
> +	else if (exit_code == SVM_EXIT_HLT || exit_code == SVM_EXIT_IDLE_HLT)
>  		return kvm_emulate_halt(vcpu);
>  	else if (exit_code == SVM_EXIT_NPF)
>  		return npf_interception(vcpu);
> 
> base-commit: b9cd96a7ff9cc9ddf95de59d69afb174a9e90c6e


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-02-27  8:21     ` Manali Shukla
@ 2025-02-27 14:32       ` Sean Christopherson
  2025-02-27 16:05         ` Manali Shukla
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-27 14:32 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay

On Thu, Feb 27, 2025, Manali Shukla wrote:
> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
> >> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
> >>  		if (vnmi)
> >>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
> >>  
> >> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
> > 
> > I am 99% certain this is wrong.  Or at the very least, severly lacking an
> > explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
> > then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
> > V_NMI.
> > 
> > Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
> > If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
> > HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
> > prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
> > reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
> > don't see the equivalent in nSVM.
> > 
> > Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
> > is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
> > the right thing, i.e. not hang the vCPU.
> > 
> > Anyways, for now, I think the easiest and best option is to simply skip full nested
> > support for the moment.
> > 
> 
> Got it, I see the issue you're talking about. I'll need to look into it a bit more to
> fully understand it. So yeah, we can hold off on full nested support for idle HLT 
> intercept for now.
> 
> Since we are planning to disable Idle HLT support on nested guests, should we do
> something like this ?
> 
> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> 
> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
> +
> 
> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies
> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem 
> because the HLT intercept takes priority when it's on. But if the HLT intercept gets 
> turned off for some reason, the IDLE HLT intercept will stay on, which is not what we
> want.

Why don't we want that?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-02-27 14:32       ` Sean Christopherson
@ 2025-02-27 16:05         ` Manali Shukla
  2025-02-27 16:11           ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Manali Shukla @ 2025-02-27 16:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay, Manali Shukla

On 2/27/2025 8:02 PM, Sean Christopherson wrote:
> On Thu, Feb 27, 2025, Manali Shukla wrote:
>> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
>>>> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
>>>>  		if (vnmi)
>>>>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
>>>>  
>>>> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
>>>
>>> I am 99% certain this is wrong.  Or at the very least, severly lacking an
>>> explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
>>> then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
>>> V_NMI.
>>>
>>> Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
>>> If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
>>> HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
>>> prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
>>> reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
>>> don't see the equivalent in nSVM.
>>>
>>> Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
>>> is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
>>> the right thing, i.e. not hang the vCPU.
>>>
>>> Anyways, for now, I think the easiest and best option is to simply skip full nested
>>> support for the moment.
>>>
>>
>> Got it, I see the issue you're talking about. I'll need to look into it a bit more to
>> fully understand it. So yeah, we can hold off on full nested support for idle HLT 
>> intercept for now.
>>
>> Since we are planning to disable Idle HLT support on nested guests, should we do
>> something like this ?
>>
>> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
>>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
>>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
>>
>> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
>> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
>> +
>>
>> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies
>> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem 
>> because the HLT intercept takes priority when it's on. But if the HLT intercept gets 
>> turned off for some reason, the IDLE HLT intercept will stay on, which is not what we
>> want.
> 
> Why don't we want that?

The idle-HLT intercept remains '1' for the L2 guest. Now, when L2 executes HLT and there
is no pending event available, it will still do idle-HLT exit, although Idle HLT
was never explicitly enabled on L2 guest.

I found this behavior by modifying the ipi_hlt_test a bit.

static void l2_guest_code(void)
{
        uint64_t icr_val;
        int i;

        x2apic_enable();

        icr_val = (APIC_DEST_SELF | APIC_INT_ASSERT | INTR_VECTOR);

        for (i = 0; i < NUM_ITERATIONS; i++) {
                cli();
                x2apic_write_reg(APIC_ICR, icr_val);
                safe_halt();
                GUEST_ASSERT(READ_ONCE(irq_received));
                WRITE_ONCE(irq_received, false);
                asm volatile("hlt");
        }
        GUEST_DONE();
}
static void l1_svm_code(struct svm_test_data *svm)
{
        unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
        struct vmcb *vmcb = svm->vmcb;

        generic_svm_setup(svm, l2_guest_code,
                          &l2_guest_stack[L2_GUEST_STACK_SIZE]);
        vmcb->control.intercept &= ~BIT(INTERCEPT_HLT);

        run_guest(vmcb, svm->vmcb_gpa);
        GUEST_DONE();
}  

Let me know if I am missing something.

-Manali


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-02-27 16:05         ` Manali Shukla
@ 2025-02-27 16:11           ` Sean Christopherson
  2025-02-28 14:58             ` Manali Shukla
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2025-02-27 16:11 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay

On Thu, Feb 27, 2025, Manali Shukla wrote:
> On 2/27/2025 8:02 PM, Sean Christopherson wrote:
> > On Thu, Feb 27, 2025, Manali Shukla wrote:
> >> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
> >>>> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
> >>>>  		if (vnmi)
> >>>>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
> >>>>  
> >>>> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
> >>>
> >>> I am 99% certain this is wrong.  Or at the very least, severly lacking an
> >>> explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
> >>> then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
> >>> V_NMI.
> >>>
> >>> Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
> >>> If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
> >>> HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
> >>> prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
> >>> reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
> >>> don't see the equivalent in nSVM.
> >>>
> >>> Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
> >>> is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
> >>> the right thing, i.e. not hang the vCPU.
> >>>
> >>> Anyways, for now, I think the easiest and best option is to simply skip full nested
> >>> support for the moment.
> >>>
> >>
> >> Got it, I see the issue you're talking about. I'll need to look into it a bit more to
> >> fully understand it. So yeah, we can hold off on full nested support for idle HLT 
> >> intercept for now.
> >>
> >> Since we are planning to disable Idle HLT support on nested guests, should we do
> >> something like this ?
> >>
> >> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
> >>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
> >>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
> >>
> >> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
> >> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
> >> +
> >>
> >> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies
> >> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem 
> >> because the HLT intercept takes priority when it's on. But if the HLT intercept gets 
> >> turned off for some reason, the IDLE HLT intercept will stay on, which is not what we
> >> want.
> > 
> > Why don't we want that?
> 
> The idle-HLT intercept remains '1' for the L2 guest. Now, when L2 executes HLT and there
> is no pending event available, it will still do idle-HLT exit, although Idle HLT
> was never explicitly enabled on L2 guest.

Yes, but why is that a problem?  L1 doesn't want to intercept HLT, so KVM isn't
violating the architecture by not intercepting HLT.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support
  2025-02-27 16:11           ` Sean Christopherson
@ 2025-02-28 14:58             ` Manali Shukla
  0 siblings, 0 replies; 16+ messages in thread
From: Manali Shukla @ 2025-02-28 14:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay, Manali Shukla

On 2/27/2025 9:41 PM, Sean Christopherson wrote:
> On Thu, Feb 27, 2025, Manali Shukla wrote:
>> On 2/27/2025 8:02 PM, Sean Christopherson wrote:
>>> On Thu, Feb 27, 2025, Manali Shukla wrote:
>>>> On 2/26/2025 3:37 AM, Sean Christopherson wrote:
>>>>>> @@ -5225,6 +5230,8 @@ static __init void svm_set_cpu_caps(void)
>>>>>>  		if (vnmi)
>>>>>>  			kvm_cpu_cap_set(X86_FEATURE_VNMI);
>>>>>>  
>>>>>> +		kvm_cpu_cap_check_and_set(X86_FEATURE_IDLE_HLT);
>>>>>
>>>>> I am 99% certain this is wrong.  Or at the very least, severly lacking an
>>>>> explanation of why it's correct.  If L1 enables Idle HLT but not HLT interception,
>>>>> then it is KVM's responsibility to NOT exit to L1 if there is a pending V_IRQ or
>>>>> V_NMI.
>>>>>
>>>>> Yeah, it's buggy.  But, it's buggy in part because *existing* KVM support is buggy.
>>>>> If L1 disables HLT exiting, but it's enabled in KVM, then KVM will run L2 with
>>>>> HLT exiting and so it becomes KVM's responsibility to check for valid L2 wake events
>>>>> prior to scheduling out the vCPU if L2 executes HLT.  E.g. nVMX handles this by
>>>>> reading vmcs02.GUEST_INTERRUPT_STATUS.RVI as part of vmx_has_nested_events().  I
>>>>> don't see the equivalent in nSVM.
>>>>>
>>>>> Amusingly, that means Idle HLT is actually a bug fix to some extent.  E.g. if there
>>>>> is a pending V_IRQ/V_NMI in vmcb02, then running with Idle HLT will naturally do
>>>>> the right thing, i.e. not hang the vCPU.
>>>>>
>>>>> Anyways, for now, I think the easiest and best option is to simply skip full nested
>>>>> support for the moment.
>>>>>
>>>>
>>>> Got it, I see the issue you're talking about. I'll need to look into it a bit more to
>>>> fully understand it. So yeah, we can hold off on full nested support for idle HLT 
>>>> intercept for now.
>>>>
>>>> Since we are planning to disable Idle HLT support on nested guests, should we do
>>>> something like this ?
>>>>
>>>> @@ -167,10 +167,15 @@ void recalc_intercepts(struct vcpu_svm *svm)
>>>>         if (!nested_svm_l2_tlb_flush_enabled(&svm->vcpu))
>>>>                 vmcb_clr_intercept(c, INTERCEPT_VMMCALL);
>>>>
>>>> +       if (!guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_IDLE_HLT))
>>>> +               vmcb_clr_intercept(c, INTERCEPT_IDLE_HLT);
>>>> +
>>>>
>>>> When recalc_intercepts copies the intercept values from vmc01 to vmcb02, it also copies
>>>> the IDLE HLT intercept bit, which is set to 1 in vmcb01. Normally, this isn't a problem 
>>>> because the HLT intercept takes priority when it's on. But if the HLT intercept gets 
>>>> turned off for some reason, the IDLE HLT intercept will stay on, which is not what we
>>>> want.
>>>
>>> Why don't we want that?
>>
>> The idle-HLT intercept remains '1' for the L2 guest. Now, when L2 executes HLT and there
>> is no pending event available, it will still do idle-HLT exit, although Idle HLT
>> was never explicitly enabled on L2 guest.
> 
> Yes, but why is that a problem?  L1 doesn't want to intercept HLT, so KVM isn't
> violating the architecture by not intercepting HLT.

What I initially understood was, since we're not showing the Idle HLT Intercept CPUID feature bit
to L1, to fully turn off Idle HLT support for nested guests, we should completely disable the 
Idle HLT intercept for L2. So, when L2 runs HLT and L1 doesn't intercept it, it should trigger
an HLT exit to L0, just like it does now.
 
But after thinking a bit more, now I get what you are saying. KVM has enabled the IDLE HLT intercept
for the L1 guest. So, when the HLT intercept is turned off for the L2 guest and L2 runs the HLT
instruction, it does an idle HLT exit to KVM. That way, KVM is working as per the IDLE HLT design.
And as you explained , the existing buggy behavior is also avoided with idle halt intercept
being kept enabled for L2 to some extent.

-Manali


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test
  2025-02-26  0:38   ` Sean Christopherson
@ 2025-02-28 15:05     ` Manali Shukla
  0 siblings, 0 replies; 16+ messages in thread
From: Manali Shukla @ 2025-02-28 15:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kselftest, pbonzini, shuah, nikunj, thomas.lendacky,
	vkuznets, bp, babu.moger, neeraj.upadhyay

Hi Sean,

Thank you for reviewing my patches.

On 2/26/2025 6:08 AM, Sean Christopherson wrote:
> On Tue, Jan 28, 2025, Manali Shukla wrote:
>> +	if (kvm_cpu_has(X86_FEATURE_IDLE_HLT))
> 
> Well, shoot.  I gave you bad input, and we're stuck.
> 
> this_cpu_has() isn't correct, because the part of my previous feedback about
> needing to check *KVM* support was 100% correct.  But kvm_cpu_has() isn't right
> either, because that checks what KVM supports exposing to the guest, not what
> KVM itself supports/uses.  E.g. even if we add full nested support, the test would
> fail if nested=0 due to KVM not "supporting" Idle HLT despite using it under the
> hood.
> 
> The lack of a way for KVM to communicate support to the user has come up in the
> past, e.g. in discussion around /proc/cpuinfo.  Sadly, AFAIK there are no (good)
> ideas on what that should look like.
> 
> For now, I'll just skip this patch, even though doing so makes me quite sad.

I understand your point.

-Manali

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v6 0/3] Add support for the Idle HLT intercept feature
  2025-01-28 12:48 [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
                   ` (3 preceding siblings ...)
  2025-02-10  5:06 ` [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
@ 2025-02-28 17:06 ` Sean Christopherson
  4 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2025-02-28 17:06 UTC (permalink / raw)
  To: Sean Christopherson, kvm, linux-kselftest, Manali Shukla
  Cc: pbonzini, shuah, nikunj, thomas.lendacky, vkuznets, bp,
	babu.moger, neeraj.upadhyay

On Tue, 28 Jan 2025 12:48:09 +0000, Manali Shukla wrote:
> The upcoming new Idle HLT Intercept feature allows for the HLT
> instruction execution by a vCPU to be intercepted by the hypervisor
> only if there are no pending V_INTR and V_NMI events for the vCPU.
> When the vCPU is expected to service the pending V_INTR and V_NMI
> events, the Idle HLT intercept won’t trigger. The feature allows the
> hypervisor to determine if the vCPU is actually idle and reduces
> wasteful VMEXITs.
> 
> [...]

Applied 1-2 to kvm-x86 svm (with the rewritten changelog).

[1/3] x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
      https://github.com/kvm-x86/linux/commit/70792aed1455
[2/3] KVM: SVM: Add Idle HLT intercept support
      https://github.com/kvm-x86/linux/commit/fa662c908073
[3/3] KVM: selftests: Add self IPI HLT test
      (no commit info)

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-02-28 17:07 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-28 12:48 [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
2025-01-28 12:48 ` [PATCH v6 1/3] x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept Manali Shukla
2025-01-28 12:48 ` [PATCH v6 2/3] KVM: SVM: Add Idle HLT intercept support Manali Shukla
2025-02-25 22:07   ` Sean Christopherson
2025-02-27  8:21     ` Manali Shukla
2025-02-27 14:32       ` Sean Christopherson
2025-02-27 16:05         ` Manali Shukla
2025-02-27 16:11           ` Sean Christopherson
2025-02-28 14:58             ` Manali Shukla
2025-01-28 12:48 ` [PATCH v6 3/3] KVM: selftests: Add self IPI HLT test Manali Shukla
2025-01-29  5:26   ` Neeraj Upadhyay
2025-02-26  0:38   ` Sean Christopherson
2025-02-28 15:05     ` Manali Shukla
2025-02-10  5:06 ` [PATCH v6 0/3] Add support for the Idle HLT intercept feature Manali Shukla
2025-02-17  4:43   ` Manali Shukla
2025-02-28 17:06 ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox