linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v4 0/2] KVM: Add support for the ERAPS feature
@ 2025-04-02  8:28 Amit Shah
  2025-04-02  8:28 ` [RFC PATCH v4 1/2] x86: kvm: svm: set up ERAPS support for guests Amit Shah
  2025-04-02  8:28 ` [RFC PATCH v4 2/2] debug: add tracepoint for flush_rap_on_vmrun Amit Shah
  0 siblings, 2 replies; 3+ messages in thread
From: Amit Shah @ 2025-04-02  8:28 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3,
	Amit Shah

Zen5+ AMD CPUs have a larger RSB (64 entries on Zen5), and use all of it in
the host context.  The hypervisor needs to set up a couple things before it's
exposed to guests.  Patch 1 adds that support.

The feature also adds host/guest tagging to entries in the RSB, which helps
with preserving RSB entries instead of flushing them across VMEXITs.  The
patches at

https://lore.kernel.org/kvm/cover.1732219175.git.jpoimboe@kernel.org/ 

address that.

The feature isn't yet part of an APM update that details its working, so this
is still tagged as RFC.  The notes at

https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/

may help follow along till the APM is public.

Patch 2 is something I used for development and debugging, I don't intend to
submit it for inclusion, but let me know if you think it's useful and I'll
prepare it for final inclusion as well.

One thing I'm not sure about, though, and would like clarification.  Quoting
from my reply to the v3 series:

When EPT/NPT is disabled, and shadow MMU is used by kvm, the CR3
register on the CPU holds the PGD of the qemu process.  So if a task
switch happens within the guest, the CR3 on the CPU is not updated, but
KVM's shadow MMU routines change the page tables pointed to by that
CR3.  Contrasting to the NPT case, the CPU's CR3 holds the guest PGD
directly, and task switches within the guest cause an update to the
CPU's CR3.

Am I misremembering and misreading the code?

v4:
* Address Sean's comments from v3
  * remove a bunch of comments in favour of a better commit message
* Drop patch 1 fromt the series - Josh's patches handle the most common case,
  and the AutoIBRS-disabled case can be tackled later if required after Josh's
  patches have been merged upstream.

v3:
* rebase on top of Josh's RSB tweaks series
  * with that rebase, only the non-AutoIBRS case needs special ERAPS support.
    AutoIBRS is currently disabled when SEV-SNP is active (commit acaa4b5c4c8)

* remove comment about RSB_CLEAR_LOOPS and the size of the RSB -- it's not
  necessary anymore with the rework

* remove comment from patch 2 in svm.c in favour of the commit message

v2:
* reword comments to highlight context switch as the main trigger for RSB
  flushes in hardware (Dave Hansen)
* Split out outdated comment updates in (v1) patch1 to be a standalone
  patch1 in this series, to reinforce RSB filling is only required for RSB
  poisoning cases for AMD
  * Remove mentions of BTC/BTC_NO (Andrew Cooper)
* Add braces in case stmt (kernel test robot)
* s/boot_cpu_has/cpu_feature_enabled (Boris Petkov)



Amit Shah (2):
  x86: kvm: svm: set up ERAPS support for guests
  debug: add tracepoint for flush_rap_on_vmrun

 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  6 +++++-
 arch/x86/kvm/cpuid.c               | 10 +++++++++-
 arch/x86/kvm/svm/svm.c             |  9 +++++++++
 arch/x86/kvm/svm/svm.h             | 15 +++++++++++++++
 arch/x86/kvm/trace.h               | 16 ++++++++++++++++
 arch/x86/kvm/x86.c                 |  1 +
 7 files changed, 56 insertions(+), 2 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH v4 1/2] x86: kvm: svm: set up ERAPS support for guests
  2025-04-02  8:28 [RFC PATCH v4 0/2] KVM: Add support for the ERAPS feature Amit Shah
@ 2025-04-02  8:28 ` Amit Shah
  2025-04-02  8:28 ` [RFC PATCH v4 2/2] debug: add tracepoint for flush_rap_on_vmrun Amit Shah
  1 sibling, 0 replies; 3+ messages in thread
From: Amit Shah @ 2025-04-02  8:28 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

From: Amit Shah <amit.shah@amd.com>

AMD CPUs with the Enhanced Return Address Predictor (ERAPS) feature
Zen5+) obviate the need for FILL_RETURN_BUFFER sequences right after
VMEXITs.  The feature adds guest/host tags to entries in the RSB (a.k.a.
RAP).  This helps with speculation protection across the VM boundary,
and it also preserves host and guest entries in the RSB that can improve
software performance (which would otherwise be flushed due to the
FILL_RETURN_BUFFER sequences).  This feature also extends the size of
the RSB from the older standard (of 32 entries) to a new default
enumerated in CPUID leaf 0x80000021:EBX bits 23:16 -- which is 64
entries in Zen5 CPUs.

Additional note - not relevant for the hypervisor usecase - CPUs with
this feature also flush the RSB when the CR3 is updated (i.e. whenever
there's a context switch),  to prevent one userspace process poisoning
the RSB that may affect another process.

The hardware feature is always-on, and the host context uses the full
default RSB size without any software changes necessary.  The presence
of this feature allows software (both in host and guest contexts) to
drop all RSB filling routines in favour of the hardware doing it.
However, guests continue to use the older default RSB size and behaviour
for backwards compatibility.  The hypervisor needs to set a bit in the
VMCB in addition exposing the CPUID bits to allow guests to also use the
full default RSB size in addition to hardware RSB flushes.

There are two guest/host configurations that need to be addressed before
allowing a guest to use this feature: nested guests, and hosts using
shadow paging (or when NPT is disabled):

1. Nested guests: the ERAPS feature adds host/guest tagging to entries
   in the RSB, but does not distinguish between the guest ASIDs.  To
   prevent the case of an L2 guest poisoning the RSB to attack the L1
   guest, the CPU exposes a new VMCB bit (FLUSH_RAP_ON_VMRUN) that the
   hypervisor sets on a nested exit.  This results in the CPU flushing
   the contents of the RSB tagged 'guest' to protect the L1 guest from
   the L2 guest.

2. Hosts that disable NPT: the ERAPS feature also flushes the RSB
   entries when the CR3 is updated.  When using shadow paging, CR3
   updates within the guest do not update the CPU's CR3 register.  In
   this case, do not expose the ERAPS feature to guests, so the guests
   continue to fill the RSB.

This patch to KVM ensures both those conditions are met, and sets the
new ALLOW_LARGER_RAP VMCB bit that exposes this feature to the guest.
That allows the new default RSB size to be used in guest contexts as
well, and allows the guest to drop its RSB flushing routines.

Signed-off-by: Amit Shah <amit.shah@amd.com>
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/include/asm/svm.h         |  6 +++++-
 arch/x86/kvm/cpuid.c               | 10 +++++++++-
 arch/x86/kvm/svm/svm.c             |  7 +++++++
 arch/x86/kvm/svm/svm.h             | 15 +++++++++++++++
 5 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 6c2c152d8a67..25c82d8fcf16 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -457,6 +457,7 @@
 #define X86_FEATURE_AUTOIBRS		(20*32+ 8) /* Automatic IBRS */
 #define X86_FEATURE_NO_SMM_CTL_MSR	(20*32+ 9) /* SMM_CTL MSR is not present */
 
+#define X86_FEATURE_ERAPS		(20*32+24) /* Enhanced Return Address Predictor Security */
 #define X86_FEATURE_SBPB		(20*32+27) /* Selective Branch Prediction Barrier */
 #define X86_FEATURE_IBPB_BRTYPE		(20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
 #define X86_FEATURE_SRSO_NO		(20*32+29) /* CPU is not affected by SRSO */
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 9b7fa99ae951..cf6a94e64e58 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -130,7 +130,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u64 tsc_offset;
 	u32 asid;
 	u8 tlb_ctl;
-	u8 reserved_2[3];
+	u8 erap_ctl;
+	u8 reserved_2[2];
 	u32 int_ctl;
 	u32 int_vector;
 	u32 int_state;
@@ -176,6 +177,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define TLB_CONTROL_FLUSH_ASID 3
 #define TLB_CONTROL_FLUSH_ASID_LOCAL 7
 
+#define ERAP_CONTROL_ALLOW_LARGER_RAP BIT(0)
+#define ERAP_CONTROL_FLUSH_RAP BIT(1)
+
 #define V_TPR_MASK 0x0f
 
 #define V_IRQ_SHIFT 8
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5e4d4934c0d3..9662c055d9d8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1187,6 +1187,9 @@ void kvm_set_cpu_caps(void)
 		F(SRSO_USER_KERNEL_NO),
 	);
 
+	if (tdp_enabled)
+		kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS);
+
 	kvm_cpu_cap_init(CPUID_8000_0022_EAX,
 		F(PERFMON_V2),
 	);
@@ -1758,8 +1761,13 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 		break;
 	case 0x80000021:
-		entry->ebx = entry->ecx = entry->edx = 0;
+		entry->ecx = entry->edx = 0;
 		cpuid_entry_override(entry, CPUID_8000_0021_EAX);
+		if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
+			entry->ebx &= GENMASK(23, 16);
+		else
+			entry->ebx = 0;
+
 		break;
 	/* AMD Extended Performance Monitoring and Debug */
 	case 0x80000022: {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d5d0c5c3300b..b5de6341080b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1369,6 +1369,9 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
 
+	if (boot_cpu_has(X86_FEATURE_ERAPS) && npt_enabled)
+		vmcb_enable_extended_rap(svm->vmcb);
+
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -3422,6 +3425,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset);
 	pr_err("%-20s%d\n", "asid:", control->asid);
 	pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl);
+	pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl);
 	pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl);
 	pr_err("%-20s%08x\n", "int_vector:", control->int_vector);
 	pr_err("%-20s%08x\n", "int_state:", control->int_state);
@@ -3603,6 +3607,9 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
 
+		if (vmcb_is_extended_rap(svm->vmcb01.ptr))
+			vmcb_flush_guest_rap(svm->vmcb01.ptr);
+
 		vmexit = nested_svm_exit_special(svm);
 
 		if (vmexit == NESTED_EXIT_CONTINUE)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index d4490eaed55d..0a29b0d294bb 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -491,6 +491,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
 	return vmcb_is_intercept(&svm->vmcb->control, bit);
 }
 
+static inline void vmcb_flush_guest_rap(struct vmcb *vmcb)
+{
+	vmcb->control.erap_ctl |= ERAP_CONTROL_FLUSH_RAP;
+}
+
+static inline void vmcb_enable_extended_rap(struct vmcb *vmcb)
+{
+	vmcb->control.erap_ctl |= ERAP_CONTROL_ALLOW_LARGER_RAP;
+}
+
+static inline bool vmcb_is_extended_rap(struct vmcb *vmcb)
+{
+	return !!(vmcb->control.erap_ctl & ERAP_CONTROL_ALLOW_LARGER_RAP);
+}
+
 static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
 {
 	return guest_cpu_cap_has(&svm->vcpu, X86_FEATURE_VGIF) &&
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC PATCH v4 2/2] debug: add tracepoint for flush_rap_on_vmrun
  2025-04-02  8:28 [RFC PATCH v4 0/2] KVM: Add support for the ERAPS feature Amit Shah
  2025-04-02  8:28 ` [RFC PATCH v4 1/2] x86: kvm: svm: set up ERAPS support for guests Amit Shah
@ 2025-04-02  8:28 ` Amit Shah
  1 sibling, 0 replies; 3+ messages in thread
From: Amit Shah @ 2025-04-02  8:28 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan, dwmw, andrew.cooper3

From: Amit Shah <amit.shah@amd.com>

---
 arch/x86/kvm/svm/svm.c |  4 +++-
 arch/x86/kvm/trace.h   | 16 ++++++++++++++++
 arch/x86/kvm/x86.c     |  1 +
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b5de6341080b..c47d4dfcc1d4 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3607,8 +3607,10 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
 
-		if (vmcb_is_extended_rap(svm->vmcb01.ptr))
+		if (vmcb_is_extended_rap(svm->vmcb01.ptr)) {
 			vmcb_flush_guest_rap(svm->vmcb01.ptr);
+			trace_kvm_svm_eraps_flush_rap(svm->vmcb01.ptr);
+		}
 
 		vmexit = nested_svm_exit_special(svm);
 
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index ccda95e53f62..059dfc744a22 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -346,6 +346,22 @@ TRACE_EVENT(name,							     \
  */
 TRACE_EVENT_KVM_EXIT(kvm_exit);
 
+TRACE_EVENT(kvm_svm_eraps_flush_rap,					     \
+	TP_PROTO(struct vmcb *vmcb),					     \
+	TP_ARGS(vmcb),							     \
+									     \
+	TP_STRUCT__entry(						     \
+		__field( struct vmcb *,		vmcb		)	     \
+	),								     \
+									     \
+	TP_fast_assign(							     \
+		__entry->vmcb	= vmcb; 				     \
+	),								     \
+									     \
+	TP_printk("vmcb: 0x%p",						     \
+		  __entry->vmcb)					     \
+)
+
 /*
  * Tracepoint for kvm interrupt injection:
  */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c841817a914a..414a0e6c9c4b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -14024,6 +14024,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_rmp_fault);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_svm_eraps_flush_rap);
 
 static int __init kvm_x86_init(void)
 {
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-04-02  8:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-02  8:28 [RFC PATCH v4 0/2] KVM: Add support for the ERAPS feature Amit Shah
2025-04-02  8:28 ` [RFC PATCH v4 1/2] x86: kvm: svm: set up ERAPS support for guests Amit Shah
2025-04-02  8:28 ` [RFC PATCH v4 2/2] debug: add tracepoint for flush_rap_on_vmrun Amit Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).