[PATCH 0/2] Add support for the ERAPS feature

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Add support for the ERAPS feature
@ 2024-10-31 15:39 Amit Shah
  2024-10-31 15:39 ` [PATCH 1/2] x86: cpu/bugs: add support for AMD " Amit Shah
  2024-10-31 15:39 ` [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Amit Shah
  0 siblings, 2 replies; 24+ messages in thread
From: Amit Shah @ 2024-10-31 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

From: Amit Shah <amit.shah@amd.com>

Newer AMD CPUs (Turin+) have the ERAPS feature bit that allows us to remove
the RSB filling loops required during context switches and VM exits.

This patchset implements the feature to:
* remove the need for RSB filling on context switches and VMEXITs in host and
  guests
* allow KVM guests to use the full default RSB stack

Amit Shah (2):
  x86: cpu/bugs: add support for AMD ERAPS feature
  x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN

 Documentation/admin-guide/hw-vuln/spectre.rst |  5 ++-
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/nospec-branch.h          | 11 +++++
 arch/x86/include/asm/svm.h                    |  6 ++-
 arch/x86/kernel/cpu/bugs.c                    | 36 ++++++++++-----
 arch/x86/kvm/cpuid.c                          | 15 ++++++-
 arch/x86/kvm/svm/svm.c                        | 44 +++++++++++++++++++
 arch/x86/kvm/svm/svm.h                        | 15 +++++++
 8 files changed, 118 insertions(+), 15 deletions(-)

-- 
2.47.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-10-31 15:39 [PATCH 0/2] Add support for the ERAPS feature Amit Shah
@ 2024-10-31 15:39 ` Amit Shah
  2024-10-31 23:03   ` Pawan Gupta
  2024-10-31 23:11   ` Dave Hansen
  2024-10-31 15:39 ` [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Amit Shah
  1 sibling, 2 replies; 24+ messages in thread
From: Amit Shah @ 2024-10-31 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

From: Amit Shah <amit.shah@amd.com>

Remove explicit RET stuffing / filling on VMEXITs and context
switches on AMD CPUs with the ERAPS feature (Turin+).

With the Enhanced Return Address Prediction Security feature,  any
hardware TLB flush results in flushing of the RSB (aka RAP in AMD spec).
This guarantees an RSB flush across context switches.  The feature also
explicitly tags host and guest addresses - eliminating the need for
explicit flushing of the RSB on VMEXIT.

The BTC_NO feature in AMD CPUs ensures RET predictions do not speculate
from outside the RSB. Together, the BTC_NO and ERAPS features ensure no
flushing or stuffing of the RSB is necessary anymore.

Feature documented in AMD PPR 57238.

Signed-off-by: Amit Shah <amit.shah@amd.com>
---
 Documentation/admin-guide/hw-vuln/spectre.rst |  5 +--
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/nospec-branch.h          | 11 ++++++
 arch/x86/kernel/cpu/bugs.c                    | 36 +++++++++++++------
 4 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/admin-guide/hw-vuln/spectre.rst
index 132e0bc6007e..647c10c0307a 100644
--- a/Documentation/admin-guide/hw-vuln/spectre.rst
+++ b/Documentation/admin-guide/hw-vuln/spectre.rst
@@ -417,9 +417,10 @@ The possible values in this file are:
 
   - Return stack buffer (RSB) protection status:
 
-  =============   ===========================================
+  =============   ========================================================
   'RSB filling'   Protection of RSB on context switch enabled
-  =============   ===========================================
+  'ERAPS'         Hardware RSB flush on context switches + guest/host tags
+  =============   ========================================================
 
   - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status:
 
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 913fd3a7bac6..665032b12871 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -458,6 +458,7 @@
 #define X86_FEATURE_AUTOIBRS		(20*32+ 8) /* Automatic IBRS */
 #define X86_FEATURE_NO_SMM_CTL_MSR	(20*32+ 9) /* SMM_CTL MSR is not present */
 
+#define X86_FEATURE_ERAPS		(20*32+24) /* Enhanced RAP / RSB / RAS Security */
 #define X86_FEATURE_SBPB		(20*32+27) /* Selective Branch Prediction Barrier */
 #define X86_FEATURE_IBPB_BRTYPE		(20*32+28) /* MSR_PRED_CMD[IBPB] flushes all branch type predictions */
 #define X86_FEATURE_SRSO_NO		(20*32+29) /* CPU is not affected by SRSO */
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 96b410b1d4e8..24d0fe5d5a8b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -117,6 +117,17 @@
  * We define a CPP macro such that it can be used from both .S files and
  * inline assembly. It's possible to do a .macro and then include that
  * from C via asm(".include <asm/nospec-branch.h>") but let's not go there.
+ *
+ * AMD CPUs with the ERAPS feature may have a larger default RSB.  These CPUs
+ * use the default number of entries on a host, and can optionally (based on
+ * hypervisor setup) use 32 (old) or the new default in a guest.  The number
+ * of default entries is reflected in CPUID 8000_0021:EBX[23:16].
+ *
+ * With the ERAPS feature, RSB filling is not necessary anymore: the RSB is
+ * auto-cleared on a TLB flush (i.e. a context switch).  Adapting the value of
+ * RSB_CLEAR_LOOPS below for ERAPS would change it to a runtime variable
+ * instead of the current compile-time constant, so leave it as-is, as this
+ * works for both older CPUs, as well as newer ones with ERAPS.
  */
 
 #define RETPOLINE_THUNK_SIZE	32
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 47a01d4028f6..83b34a522dd7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1828,9 +1828,6 @@ static void __init spectre_v2_select_mitigation(void)
 	 *    speculated return targets may come from the branch predictor,
 	 *    which could have a user-poisoned BTB or BHB entry.
 	 *
-	 *    AMD has it even worse: *all* returns are speculated from the BTB,
-	 *    regardless of the state of the RSB.
-	 *
 	 *    When IBRS or eIBRS is enabled, the "user -> kernel" attack
 	 *    scenario is mitigated by the IBRS branch prediction isolation
 	 *    properties, so the RSB buffer filling wouldn't be necessary to
@@ -1838,6 +1835,15 @@ static void __init spectre_v2_select_mitigation(void)
 	 *
 	 *    The "user -> user" attack scenario is mitigated by RSB filling.
 	 *
+	 *    AMD CPUs without the BTC_NO bit may speculate return targets
+	 *    from the BTB. CPUs with BTC_NO do not speculate return targets
+	 *    from the BTB, even on RSB underflow.
+	 *
+	 *    The ERAPS CPU feature (which implies the presence of BTC_NO)
+	 *    adds an RSB flush each time a TLB flush happens (i.e., on every
+	 *    context switch).  So, RSB filling is not necessary for this
+	 *    attack type with ERAPS present.
+	 *
 	 * 2) Poisoned RSB entry
 	 *
 	 *    If the 'next' in-kernel return stack is shorter than 'prev',
@@ -1848,17 +1854,24 @@ static void __init spectre_v2_select_mitigation(void)
 	 *    eIBRS.
 	 *
 	 *    The "user -> user" scenario, also known as SpectreBHB, requires
-	 *    RSB clearing.
+	 *    RSB clearing on processors without ERAPS.
 	 *
 	 * So to mitigate all cases, unconditionally fill RSB on context
-	 * switches.
-	 *
-	 * FIXME: Is this pointless for retbleed-affected AMD?
+	 * switches when ERAPS is not present.
 	 */
-	setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
-	pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
+	if (!boot_cpu_has(X86_FEATURE_ERAPS)) {
+		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
+		pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
 
-	spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
+		/*
+		 * For guest -> host (or vice versa) RSB poisoning scenarios,
+		 * determine the mitigation mode here.  With ERAPS, RSB
+		 * entries are tagged as host or guest - ensuring that neither
+		 * the host nor the guest have to clear or fill RSB entries to
+		 * avoid poisoning, skip RSB filling at VMEXIT in that case.
+		 */
+		spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
+	}
 
 	/*
 	 * Retpoline protects the kernel, but doesn't protect firmware.  IBRS
@@ -2871,7 +2884,7 @@ static ssize_t spectre_v2_show_state(char *buf)
 	    spectre_v2_enabled == SPECTRE_V2_EIBRS_LFENCE)
 		return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF and SMT\n");
 
-	return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n",
+	return sysfs_emit(buf, "%s%s%s%s%s%s%s%s%s\n",
 			  spectre_v2_strings[spectre_v2_enabled],
 			  ibpb_state(),
 			  boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "",
@@ -2879,6 +2892,7 @@ static ssize_t spectre_v2_show_state(char *buf)
 			  boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "",
 			  pbrsb_eibrs_state(),
 			  spectre_bhi_state(),
+			  boot_cpu_has(X86_FEATURE_ERAPS) ? "; ERAPS hardware RSB flush" : "",
 			  /* this should always be at the end */
 			  spectre_v2_module_string());
 }
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
  2024-10-31 15:39 [PATCH 0/2] Add support for the ERAPS feature Amit Shah
  2024-10-31 15:39 ` [PATCH 1/2] x86: cpu/bugs: add support for AMD " Amit Shah
@ 2024-10-31 15:39 ` Amit Shah
  2024-10-31 23:13   ` Pawan Gupta
                     ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Amit Shah @ 2024-10-31 15:39 UTC (permalink / raw)
  To: linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

From: Amit Shah <amit.shah@amd.com>

AMD CPUs with the ERAPS feature (Turin+) have a larger RSB (aka RAP).
While the new default RSB size is used on the host without any software
modification necessary, the RSB usage for guests is limited to the older
value (32 entries) for backwards compatibility.  With this patch, KVM
enables guest mode to also use the default number of entries by setting
the new ALLOW_LARGER_RAP bit in the VMCB.

The two cases for backward compatibility that need special handling are
nested guests, and guests using shadow paging (or when NPT is disabled):

For nested guests: the ERAPS feature adds host/guest tagging to entries
in the RSB, but does not distinguish between ASIDs.  On a nested exit,
the L0 hypervisor instructs the microcode (via another new VMCB bit,
FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent RSB
poisoning attacks from an L2 guest to an L1 guest.  With that in place,
this feature can be exposed to guests.

For shadow paging guests: do not expose this feature to guests; only
expose if nested paging is enabled, to ensure context switches within
guests trigger TLB flushes on the CPU -- thereby ensuring guest context
switches flush guest RSB entries.  For shadow paging, the CPU's CR3 is
not used for guest processes, and hence cannot benefit from this
feature.

Signed-off-by: Amit Shah <amit.shah@amd.com>
---
 arch/x86/include/asm/svm.h |  6 +++++-
 arch/x86/kvm/cpuid.c       | 15 ++++++++++++-
 arch/x86/kvm/svm/svm.c     | 44 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h     | 15 +++++++++++++
 4 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 2b59b9951c90..f8584a63c859 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -129,7 +129,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 	u64 tsc_offset;
 	u32 asid;
 	u8 tlb_ctl;
-	u8 reserved_2[3];
+	u8 erap_ctl;
+	u8 reserved_2[2];
 	u32 int_ctl;
 	u32 int_vector;
 	u32 int_state;
@@ -175,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define TLB_CONTROL_FLUSH_ASID 3
 #define TLB_CONTROL_FLUSH_ASID_LOCAL 7
 
+#define ERAP_CONTROL_ALLOW_LARGER_RAP 0
+#define ERAP_CONTROL_FLUSH_RAP 1
+
 #define V_TPR_MASK 0x0f
 
 #define V_IRQ_SHIFT 8
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 41786b834b16..2c2a60964a2e 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -797,6 +797,8 @@ void kvm_set_cpu_caps(void)
 		F(WRMSR_XX_BASE_NS)
 	);
 
+	if (tdp_enabled)
+		kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE);
 	kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO);
@@ -1357,8 +1359,19 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
 		break;
 	case 0x80000021:
-		entry->ebx = entry->ecx = entry->edx = 0;
+		unsigned int ebx_mask = 0;
+
+		entry->ecx = entry->edx = 0;
 		cpuid_entry_override(entry, CPUID_8000_0021_EAX);
+
+		/*
+		 * Bits 23:16 in EBX indicate the size of the RSB.
+		 * Expose the value in the hardware to the guest.
+		 */
+		if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
+			ebx_mask |= GENMASK(23, 16);
+
+		entry->ebx &= ebx_mask;
 		break;
 	/* AMD Extended Performance Monitoring and Debug */
 	case 0x80000022: {
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9df3e1e5ae81..ecd290ff38f8 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1360,6 +1360,28 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL))
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1);
 
+	/*
+	 * If the hardware has a larger RSB, use it in the guest context as
+	 * well.
+	 *
+	 * When running nested guests: the hardware tags host and guest RSB
+	 * entries, but the entries are ASID agnostic.  Differentiating L1 and
+	 * L2 guests isn't possible in hardware.  To prevent L2->L1 RSB
+	 * poisoning attacks in this case, the L0 hypervisor must set
+	 * FLUSH_RAP_ON_VMRUN in the L1's VMCB on a nested #VMEXIT to ensure
+	 * the next VMRUN flushes the RSB.
+	 *
+	 * For shadow paging / NPT disabled case: the CPU's CR3 does not
+	 * contain the CR3 of the running guest process, and hence intra-guest
+	 * context switches will not cause a hardware TLB flush, which in turn
+	 * does not result in a guest RSB flush that the ERAPS feature
+	 * provides.  Do not expose ERAPS or the larger RSB to the guest in
+	 * this case, so the guest continues implementing software mitigations
+	 * as well as only sees 32 entries for the RSB.
+	 */
+	if (boot_cpu_has(X86_FEATURE_ERAPS) && npt_enabled)
+		vmcb_set_larger_rap(svm->vmcb);
+
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
@@ -3393,6 +3415,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu)
 	pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset);
 	pr_err("%-20s%d\n", "asid:", control->asid);
 	pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl);
+	pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl);
 	pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl);
 	pr_err("%-20s%08x\n", "int_vector:", control->int_vector);
 	pr_err("%-20s%08x\n", "int_state:", control->int_state);
@@ -3559,6 +3582,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
 
 		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
 
+		if (boot_cpu_has(X86_FEATURE_ERAPS)
+		    && vmcb_is_larger_rap(svm->vmcb01.ptr)) {
+			/*
+			 * XXX a few further optimizations can be made:
+			 *
+			 * 1. In pre_svm_run() we can reset this bit when a hw
+			 * TLB flush has happened - any context switch on a
+			 * CPU (which causes a TLB flush) auto-flushes the RSB
+			 * - eg when this vCPU is scheduled on a different
+			 * pCPU.
+			 *
+			 * 2. This is also not needed in the case where the
+			 * vCPU is being scheduled on the same pCPU, but there
+			 * was a context switch between the #VMEXIT and VMRUN.
+			 *
+			 * 3. If the guest returns to L2 again after this
+			 * #VMEXIT, there's no need to flush the RSB.
+			 */
+			vmcb_set_flush_rap(svm->vmcb01.ptr);
+		}
+
 		vmexit = nested_svm_exit_special(svm);
 
 		if (vmexit == NESTED_EXIT_CONTINUE)
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 43fa6a16eb19..8a7877f46dc5 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -500,6 +500,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *svm, int bit)
 	return vmcb_is_intercept(&svm->vmcb->control, bit);
 }
 
+static inline void vmcb_set_flush_rap(struct vmcb *vmcb)
+{
+	__set_bit(ERAP_CONTROL_FLUSH_RAP, (unsigned long *)&vmcb->control.erap_ctl);
+}
+
+static inline void vmcb_set_larger_rap(struct vmcb *vmcb)
+{
+	__set_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl);
+}
+
+static inline bool vmcb_is_larger_rap(struct vmcb *vmcb)
+{
+	return test_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.erap_ctl);
+}
+
 static inline bool nested_vgif_enabled(struct vcpu_svm *svm)
 {
 	return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) &&
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-10-31 15:39 ` [PATCH 1/2] x86: cpu/bugs: add support for AMD " Amit Shah
@ 2024-10-31 23:03   ` Pawan Gupta
  2024-11-04  8:57     ` Shah, Amit
  2024-10-31 23:11   ` Dave Hansen
  1 sibling, 1 reply; 24+ messages in thread
From: Pawan Gupta @ 2024-10-31 23:03 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, kvm, x86, linux-doc, amit.shah, thomas.lendacky, bp,
	tglx, peterz, jpoimboe, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

On Thu, Oct 31, 2024 at 04:39:24PM +0100, Amit Shah wrote:
> From: Amit Shah <amit.shah@amd.com>
> 
> Remove explicit RET stuffing / filling on VMEXITs and context
> switches on AMD CPUs with the ERAPS feature (Turin+).
> 
> With the Enhanced Return Address Prediction Security feature,  any
> hardware TLB flush results in flushing of the RSB (aka RAP in AMD spec).
> This guarantees an RSB flush across context switches.

Is it that the mov to CR3 triggers the RSB flush?

> Feature documented in AMD PPR 57238.

I couldn't find ERAPS feature description here, I could only manage to find
the bit position:

24 	ERAPS. Read-only. Reset: 1. Indicates support for enhanced return
	address predictor security.

Could you please point me to the document/section where this is described?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-10-31 15:39 ` [PATCH 1/2] x86: cpu/bugs: add support for AMD " Amit Shah
  2024-10-31 23:03   ` Pawan Gupta
@ 2024-10-31 23:11   ` Dave Hansen
  2024-11-04  8:58     ` Shah, Amit
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2024-10-31 23:11 UTC (permalink / raw)
  To: Amit Shah, linux-kernel, kvm, x86, linux-doc
  Cc: amit.shah, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

On 10/31/24 08:39, Amit Shah wrote:
...
> With the Enhanced Return Address Prediction Security feature,  any
> hardware TLB flush results in flushing of the RSB (aka RAP in AMD spec).
> This guarantees an RSB flush across context switches. 

Check out the APM, volume 2: "5.5.1 Process Context Identifier"

	... when system software switches address spaces (by writing ...
	CR3[62:12]), the processor may use TLB mappings previously
	stored for that address space and PCID, providing that bit 63 of
	the source operand is set to 1.

tl;dr: PCIDs mean you don't necessarily flush the TLB on context switches.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
  2024-10-31 15:39 ` [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Amit Shah
@ 2024-10-31 23:13   ` Pawan Gupta
  2024-11-01  4:14   ` kernel test robot
  2024-11-04  5:18   ` Borislav Petkov
  2 siblings, 0 replies; 24+ messages in thread
From: Pawan Gupta @ 2024-10-31 23:13 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, kvm, x86, linux-doc, amit.shah, thomas.lendacky, bp,
	tglx, peterz, jpoimboe, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

On Thu, Oct 31, 2024 at 04:39:25PM +0100, Amit Shah wrote:
> @@ -3559,6 +3582,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
>  
>  		trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM);
>  
> +		if (boot_cpu_has(X86_FEATURE_ERAPS)
> +		    && vmcb_is_larger_rap(svm->vmcb01.ptr)) {
		    ^
		    This should be at the end of previous line.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
  2024-10-31 15:39 ` [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Amit Shah
  2024-10-31 23:13   ` Pawan Gupta
@ 2024-11-01  4:14   ` kernel test robot
  2024-11-01  9:26     ` Amit Shah
  2024-11-04  5:18   ` Borislav Petkov
  2 siblings, 1 reply; 24+ messages in thread
From: kernel test robot @ 2024-11-01  4:14 UTC (permalink / raw)
  To: Amit Shah, linux-kernel, kvm, x86, linux-doc
  Cc: llvm, oe-kbuild-all, amit.shah, thomas.lendacky, bp, tglx, peterz,
	jpoimboe, pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa,
	seanjc, pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

Hi Amit,

kernel test robot noticed the following build warnings:

[auto build test WARNING on kvm/queue]
[also build test WARNING on mst-vhost/linux-next tip/master tip/x86/core linus/master v6.12-rc5 next-20241031]
[cannot apply to kvm/linux-next tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Amit-Shah/x86-cpu-bugs-add-support-for-AMD-ERAPS-feature/20241031-234332
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
patch link:    https://lore.kernel.org/r/20241031153925.36216-3-amit%40kernel.org
patch subject: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241101/202411011119.l3yRJpht-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241101/202411011119.l3yRJpht-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202411011119.l3yRJpht-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from arch/x86/kvm/cpuid.c:13:
   In file included from include/linux/kvm_host.h:16:
   In file included from include/linux/mm.h:2213:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> arch/x86/kvm/cpuid.c:1362:3: warning: label followed by a declaration is a C23 extension [-Wc23-extensions]
    1362 |                 unsigned int ebx_mask = 0;
         |                 ^
   5 warnings generated.


vim +1362 arch/x86/kvm/cpuid.c

   940	
   941	static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
   942	{
   943		struct kvm_cpuid_entry2 *entry;
   944		int r, i, max_idx;
   945	
   946		/* all calls to cpuid_count() should be made on the same cpu */
   947		get_cpu();
   948	
   949		r = -E2BIG;
   950	
   951		entry = do_host_cpuid(array, function, 0);
   952		if (!entry)
   953			goto out;
   954	
   955		switch (function) {
   956		case 0:
   957			/* Limited to the highest leaf implemented in KVM. */
   958			entry->eax = min(entry->eax, 0x24U);
   959			break;
   960		case 1:
   961			cpuid_entry_override(entry, CPUID_1_EDX);
   962			cpuid_entry_override(entry, CPUID_1_ECX);
   963			break;
   964		case 2:
   965			/*
   966			 * On ancient CPUs, function 2 entries are STATEFUL.  That is,
   967			 * CPUID(function=2, index=0) may return different results each
   968			 * time, with the least-significant byte in EAX enumerating the
   969			 * number of times software should do CPUID(2, 0).
   970			 *
   971			 * Modern CPUs, i.e. every CPU KVM has *ever* run on are less
   972			 * idiotic.  Intel's SDM states that EAX & 0xff "will always
   973			 * return 01H. Software should ignore this value and not
   974			 * interpret it as an informational descriptor", while AMD's
   975			 * APM states that CPUID(2) is reserved.
   976			 *
   977			 * WARN if a frankenstein CPU that supports virtualization and
   978			 * a stateful CPUID.0x2 is encountered.
   979			 */
   980			WARN_ON_ONCE((entry->eax & 0xff) > 1);
   981			break;
   982		/* functions 4 and 0x8000001d have additional index. */
   983		case 4:
   984		case 0x8000001d:
   985			/*
   986			 * Read entries until the cache type in the previous entry is
   987			 * zero, i.e. indicates an invalid entry.
   988			 */
   989			for (i = 1; entry->eax & 0x1f; ++i) {
   990				entry = do_host_cpuid(array, function, i);
   991				if (!entry)
   992					goto out;
   993			}
   994			break;
   995		case 6: /* Thermal management */
   996			entry->eax = 0x4; /* allow ARAT */
   997			entry->ebx = 0;
   998			entry->ecx = 0;
   999			entry->edx = 0;
  1000			break;
  1001		/* function 7 has additional index. */
  1002		case 7:
  1003			max_idx = entry->eax = min(entry->eax, 2u);
  1004			cpuid_entry_override(entry, CPUID_7_0_EBX);
  1005			cpuid_entry_override(entry, CPUID_7_ECX);
  1006			cpuid_entry_override(entry, CPUID_7_EDX);
  1007	
  1008			/* KVM only supports up to 0x7.2, capped above via min(). */
  1009			if (max_idx >= 1) {
  1010				entry = do_host_cpuid(array, function, 1);
  1011				if (!entry)
  1012					goto out;
  1013	
  1014				cpuid_entry_override(entry, CPUID_7_1_EAX);
  1015				cpuid_entry_override(entry, CPUID_7_1_EDX);
  1016				entry->ebx = 0;
  1017				entry->ecx = 0;
  1018			}
  1019			if (max_idx >= 2) {
  1020				entry = do_host_cpuid(array, function, 2);
  1021				if (!entry)
  1022					goto out;
  1023	
  1024				cpuid_entry_override(entry, CPUID_7_2_EDX);
  1025				entry->ecx = 0;
  1026				entry->ebx = 0;
  1027				entry->eax = 0;
  1028			}
  1029			break;
  1030		case 0xa: { /* Architectural Performance Monitoring */
  1031			union cpuid10_eax eax;
  1032			union cpuid10_edx edx;
  1033	
  1034			if (!enable_pmu || !static_cpu_has(X86_FEATURE_ARCH_PERFMON)) {
  1035				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1036				break;
  1037			}
  1038	
  1039			eax.split.version_id = kvm_pmu_cap.version;
  1040			eax.split.num_counters = kvm_pmu_cap.num_counters_gp;
  1041			eax.split.bit_width = kvm_pmu_cap.bit_width_gp;
  1042			eax.split.mask_length = kvm_pmu_cap.events_mask_len;
  1043			edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed;
  1044			edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed;
  1045	
  1046			if (kvm_pmu_cap.version)
  1047				edx.split.anythread_deprecated = 1;
  1048			edx.split.reserved1 = 0;
  1049			edx.split.reserved2 = 0;
  1050	
  1051			entry->eax = eax.full;
  1052			entry->ebx = kvm_pmu_cap.events_mask;
  1053			entry->ecx = 0;
  1054			entry->edx = edx.full;
  1055			break;
  1056		}
  1057		case 0x1f:
  1058		case 0xb:
  1059			/*
  1060			 * No topology; a valid topology is indicated by the presence
  1061			 * of subleaf 1.
  1062			 */
  1063			entry->eax = entry->ebx = entry->ecx = 0;
  1064			break;
  1065		case 0xd: {
  1066			u64 permitted_xcr0 = kvm_get_filtered_xcr0();
  1067			u64 permitted_xss = kvm_caps.supported_xss;
  1068	
  1069			entry->eax &= permitted_xcr0;
  1070			entry->ebx = xstate_required_size(permitted_xcr0, false);
  1071			entry->ecx = entry->ebx;
  1072			entry->edx &= permitted_xcr0 >> 32;
  1073			if (!permitted_xcr0)
  1074				break;
  1075	
  1076			entry = do_host_cpuid(array, function, 1);
  1077			if (!entry)
  1078				goto out;
  1079	
  1080			cpuid_entry_override(entry, CPUID_D_1_EAX);
  1081			if (entry->eax & (F(XSAVES)|F(XSAVEC)))
  1082				entry->ebx = xstate_required_size(permitted_xcr0 | permitted_xss,
  1083								  true);
  1084			else {
  1085				WARN_ON_ONCE(permitted_xss != 0);
  1086				entry->ebx = 0;
  1087			}
  1088			entry->ecx &= permitted_xss;
  1089			entry->edx &= permitted_xss >> 32;
  1090	
  1091			for (i = 2; i < 64; ++i) {
  1092				bool s_state;
  1093				if (permitted_xcr0 & BIT_ULL(i))
  1094					s_state = false;
  1095				else if (permitted_xss & BIT_ULL(i))
  1096					s_state = true;
  1097				else
  1098					continue;
  1099	
  1100				entry = do_host_cpuid(array, function, i);
  1101				if (!entry)
  1102					goto out;
  1103	
  1104				/*
  1105				 * The supported check above should have filtered out
  1106				 * invalid sub-leafs.  Only valid sub-leafs should
  1107				 * reach this point, and they should have a non-zero
  1108				 * save state size.  Furthermore, check whether the
  1109				 * processor agrees with permitted_xcr0/permitted_xss
  1110				 * on whether this is an XCR0- or IA32_XSS-managed area.
  1111				 */
  1112				if (WARN_ON_ONCE(!entry->eax || (entry->ecx & 0x1) != s_state)) {
  1113					--array->nent;
  1114					continue;
  1115				}
  1116	
  1117				if (!kvm_cpu_cap_has(X86_FEATURE_XFD))
  1118					entry->ecx &= ~BIT_ULL(2);
  1119				entry->edx = 0;
  1120			}
  1121			break;
  1122		}
  1123		case 0x12:
  1124			/* Intel SGX */
  1125			if (!kvm_cpu_cap_has(X86_FEATURE_SGX)) {
  1126				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1127				break;
  1128			}
  1129	
  1130			/*
  1131			 * Index 0: Sub-features, MISCSELECT (a.k.a extended features)
  1132			 * and max enclave sizes.   The SGX sub-features and MISCSELECT
  1133			 * are restricted by kernel and KVM capabilities (like most
  1134			 * feature flags), while enclave size is unrestricted.
  1135			 */
  1136			cpuid_entry_override(entry, CPUID_12_EAX);
  1137			entry->ebx &= SGX_MISC_EXINFO;
  1138	
  1139			entry = do_host_cpuid(array, function, 1);
  1140			if (!entry)
  1141				goto out;
  1142	
  1143			/*
  1144			 * Index 1: SECS.ATTRIBUTES.  ATTRIBUTES are restricted a la
  1145			 * feature flags.  Advertise all supported flags, including
  1146			 * privileged attributes that require explicit opt-in from
  1147			 * userspace.  ATTRIBUTES.XFRM is not adjusted as userspace is
  1148			 * expected to derive it from supported XCR0.
  1149			 */
  1150			entry->eax &= SGX_ATTR_PRIV_MASK | SGX_ATTR_UNPRIV_MASK;
  1151			entry->ebx &= 0;
  1152			break;
  1153		/* Intel PT */
  1154		case 0x14:
  1155			if (!kvm_cpu_cap_has(X86_FEATURE_INTEL_PT)) {
  1156				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1157				break;
  1158			}
  1159	
  1160			for (i = 1, max_idx = entry->eax; i <= max_idx; ++i) {
  1161				if (!do_host_cpuid(array, function, i))
  1162					goto out;
  1163			}
  1164			break;
  1165		/* Intel AMX TILE */
  1166		case 0x1d:
  1167			if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) {
  1168				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1169				break;
  1170			}
  1171	
  1172			for (i = 1, max_idx = entry->eax; i <= max_idx; ++i) {
  1173				if (!do_host_cpuid(array, function, i))
  1174					goto out;
  1175			}
  1176			break;
  1177		case 0x1e: /* TMUL information */
  1178			if (!kvm_cpu_cap_has(X86_FEATURE_AMX_TILE)) {
  1179				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1180				break;
  1181			}
  1182			break;
  1183		case 0x24: {
  1184			u8 avx10_version;
  1185	
  1186			if (!kvm_cpu_cap_has(X86_FEATURE_AVX10)) {
  1187				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1188				break;
  1189			}
  1190	
  1191			/*
  1192			 * The AVX10 version is encoded in EBX[7:0].  Note, the version
  1193			 * is guaranteed to be >=1 if AVX10 is supported.  Note #2, the
  1194			 * version needs to be captured before overriding EBX features!
  1195			 */
  1196			avx10_version = min_t(u8, entry->ebx & 0xff, 1);
  1197			cpuid_entry_override(entry, CPUID_24_0_EBX);
  1198			entry->ebx |= avx10_version;
  1199	
  1200			entry->eax = 0;
  1201			entry->ecx = 0;
  1202			entry->edx = 0;
  1203			break;
  1204		}
  1205		case KVM_CPUID_SIGNATURE: {
  1206			const u32 *sigptr = (const u32 *)KVM_SIGNATURE;
  1207			entry->eax = KVM_CPUID_FEATURES;
  1208			entry->ebx = sigptr[0];
  1209			entry->ecx = sigptr[1];
  1210			entry->edx = sigptr[2];
  1211			break;
  1212		}
  1213		case KVM_CPUID_FEATURES:
  1214			entry->eax = (1 << KVM_FEATURE_CLOCKSOURCE) |
  1215				     (1 << KVM_FEATURE_NOP_IO_DELAY) |
  1216				     (1 << KVM_FEATURE_CLOCKSOURCE2) |
  1217				     (1 << KVM_FEATURE_ASYNC_PF) |
  1218				     (1 << KVM_FEATURE_PV_EOI) |
  1219				     (1 << KVM_FEATURE_CLOCKSOURCE_STABLE_BIT) |
  1220				     (1 << KVM_FEATURE_PV_UNHALT) |
  1221				     (1 << KVM_FEATURE_PV_TLB_FLUSH) |
  1222				     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
  1223				     (1 << KVM_FEATURE_PV_SEND_IPI) |
  1224				     (1 << KVM_FEATURE_POLL_CONTROL) |
  1225				     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
  1226				     (1 << KVM_FEATURE_ASYNC_PF_INT);
  1227	
  1228			if (sched_info_on())
  1229				entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
  1230	
  1231			entry->ebx = 0;
  1232			entry->ecx = 0;
  1233			entry->edx = 0;
  1234			break;
  1235		case 0x80000000:
  1236			entry->eax = min(entry->eax, 0x80000022);
  1237			/*
  1238			 * Serializing LFENCE is reported in a multitude of ways, and
  1239			 * NullSegClearsBase is not reported in CPUID on Zen2; help
  1240			 * userspace by providing the CPUID leaf ourselves.
  1241			 *
  1242			 * However, only do it if the host has CPUID leaf 0x8000001d.
  1243			 * QEMU thinks that it can query the host blindly for that
  1244			 * CPUID leaf if KVM reports that it supports 0x8000001d or
  1245			 * above.  The processor merrily returns values from the
  1246			 * highest Intel leaf which QEMU tries to use as the guest's
  1247			 * 0x8000001d.  Even worse, this can result in an infinite
  1248			 * loop if said highest leaf has no subleaves indexed by ECX.
  1249			 */
  1250			if (entry->eax >= 0x8000001d &&
  1251			    (static_cpu_has(X86_FEATURE_LFENCE_RDTSC)
  1252			     || !static_cpu_has_bug(X86_BUG_NULL_SEG)))
  1253				entry->eax = max(entry->eax, 0x80000021);
  1254			break;
  1255		case 0x80000001:
  1256			entry->ebx &= ~GENMASK(27, 16);
  1257			cpuid_entry_override(entry, CPUID_8000_0001_EDX);
  1258			cpuid_entry_override(entry, CPUID_8000_0001_ECX);
  1259			break;
  1260		case 0x80000005:
  1261			/*  Pass host L1 cache and TLB info. */
  1262			break;
  1263		case 0x80000006:
  1264			/* Drop reserved bits, pass host L2 cache and TLB info. */
  1265			entry->edx &= ~GENMASK(17, 16);
  1266			break;
  1267		case 0x80000007: /* Advanced power management */
  1268			cpuid_entry_override(entry, CPUID_8000_0007_EDX);
  1269	
  1270			/* mask against host */
  1271			entry->edx &= boot_cpu_data.x86_power;
  1272			entry->eax = entry->ebx = entry->ecx = 0;
  1273			break;
  1274		case 0x80000008: {
  1275			/*
  1276			 * GuestPhysAddrSize (EAX[23:16]) is intended for software
  1277			 * use.
  1278			 *
  1279			 * KVM's ABI is to report the effective MAXPHYADDR for the
  1280			 * guest in PhysAddrSize (phys_as), and the maximum
  1281			 * *addressable* GPA in GuestPhysAddrSize (g_phys_as).
  1282			 *
  1283			 * GuestPhysAddrSize is valid if and only if TDP is enabled,
  1284			 * in which case the max GPA that can be addressed by KVM may
  1285			 * be less than the max GPA that can be legally generated by
  1286			 * the guest, e.g. if MAXPHYADDR>48 but the CPU doesn't
  1287			 * support 5-level TDP.
  1288			 */
  1289			unsigned int virt_as = max((entry->eax >> 8) & 0xff, 48U);
  1290			unsigned int phys_as, g_phys_as;
  1291	
  1292			/*
  1293			 * If TDP (NPT) is disabled use the adjusted host MAXPHYADDR as
  1294			 * the guest operates in the same PA space as the host, i.e.
  1295			 * reductions in MAXPHYADDR for memory encryption affect shadow
  1296			 * paging, too.
  1297			 *
  1298			 * If TDP is enabled, use the raw bare metal MAXPHYADDR as
  1299			 * reductions to the HPAs do not affect GPAs.  The max
  1300			 * addressable GPA is the same as the max effective GPA, except
  1301			 * that it's capped at 48 bits if 5-level TDP isn't supported
  1302			 * (hardware processes bits 51:48 only when walking the fifth
  1303			 * level page table).
  1304			 */
  1305			if (!tdp_enabled) {
  1306				phys_as = boot_cpu_data.x86_phys_bits;
  1307				g_phys_as = 0;
  1308			} else {
  1309				phys_as = entry->eax & 0xff;
  1310				g_phys_as = phys_as;
  1311				if (kvm_mmu_get_max_tdp_level() < 5)
  1312					g_phys_as = min(g_phys_as, 48);
  1313			}
  1314	
  1315			entry->eax = phys_as | (virt_as << 8) | (g_phys_as << 16);
  1316			entry->ecx &= ~(GENMASK(31, 16) | GENMASK(11, 8));
  1317			entry->edx = 0;
  1318			cpuid_entry_override(entry, CPUID_8000_0008_EBX);
  1319			break;
  1320		}
  1321		case 0x8000000A:
  1322			if (!kvm_cpu_cap_has(X86_FEATURE_SVM)) {
  1323				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1324				break;
  1325			}
  1326			entry->eax = 1; /* SVM revision 1 */
  1327			entry->ebx = 8; /* Lets support 8 ASIDs in case we add proper
  1328					   ASID emulation to nested SVM */
  1329			entry->ecx = 0; /* Reserved */
  1330			cpuid_entry_override(entry, CPUID_8000_000A_EDX);
  1331			break;
  1332		case 0x80000019:
  1333			entry->ecx = entry->edx = 0;
  1334			break;
  1335		case 0x8000001a:
  1336			entry->eax &= GENMASK(2, 0);
  1337			entry->ebx = entry->ecx = entry->edx = 0;
  1338			break;
  1339		case 0x8000001e:
  1340			/* Do not return host topology information.  */
  1341			entry->eax = entry->ebx = entry->ecx = 0;
  1342			entry->edx = 0; /* reserved */
  1343			break;
  1344		case 0x8000001F:
  1345			if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
  1346				entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1347			} else {
  1348				cpuid_entry_override(entry, CPUID_8000_001F_EAX);
  1349				/* Clear NumVMPL since KVM does not support VMPL.  */
  1350				entry->ebx &= ~GENMASK(31, 12);
  1351				/*
  1352				 * Enumerate '0' for "PA bits reduction", the adjusted
  1353				 * MAXPHYADDR is enumerated directly (see 0x80000008).
  1354				 */
  1355				entry->ebx &= ~GENMASK(11, 6);
  1356			}
  1357			break;
  1358		case 0x80000020:
  1359			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1360			break;
  1361		case 0x80000021:
> 1362			unsigned int ebx_mask = 0;
  1363	
  1364			entry->ecx = entry->edx = 0;
  1365			cpuid_entry_override(entry, CPUID_8000_0021_EAX);
  1366	
  1367			/*
  1368			 * Bits 23:16 in EBX indicate the size of the RSB.
  1369			 * Expose the value in the hardware to the guest.
  1370			 */
  1371			if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
  1372				ebx_mask |= GENMASK(23, 16);
  1373	
  1374			entry->ebx &= ebx_mask;
  1375			break;
  1376		/* AMD Extended Performance Monitoring and Debug */
  1377		case 0x80000022: {
  1378			union cpuid_0x80000022_ebx ebx;
  1379	
  1380			entry->ecx = entry->edx = 0;
  1381			if (!enable_pmu || !kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2)) {
  1382				entry->eax = entry->ebx;
  1383				break;
  1384			}
  1385	
  1386			cpuid_entry_override(entry, CPUID_8000_0022_EAX);
  1387	
  1388			if (kvm_cpu_cap_has(X86_FEATURE_PERFMON_V2))
  1389				ebx.split.num_core_pmc = kvm_pmu_cap.num_counters_gp;
  1390			else if (kvm_cpu_cap_has(X86_FEATURE_PERFCTR_CORE))
  1391				ebx.split.num_core_pmc = AMD64_NUM_COUNTERS_CORE;
  1392			else
  1393				ebx.split.num_core_pmc = AMD64_NUM_COUNTERS;
  1394	
  1395			entry->ebx = ebx.full;
  1396			break;
  1397		}
  1398		/*Add support for Centaur's CPUID instruction*/
  1399		case 0xC0000000:
  1400			/*Just support up to 0xC0000004 now*/
  1401			entry->eax = min(entry->eax, 0xC0000004);
  1402			break;
  1403		case 0xC0000001:
  1404			cpuid_entry_override(entry, CPUID_C000_0001_EDX);
  1405			break;
  1406		case 3: /* Processor serial number */
  1407		case 5: /* MONITOR/MWAIT */
  1408		case 0xC0000002:
  1409		case 0xC0000003:
  1410		case 0xC0000004:
  1411		default:
  1412			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
  1413			break;
  1414		}
  1415	
  1416		r = 0;
  1417	
  1418	out:
  1419		put_cpu();
  1420	
  1421		return r;
  1422	}
  1423	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
  2024-11-01  4:14   ` kernel test robot
@ 2024-11-01  9:26     ` Amit Shah
  0 siblings, 0 replies; 24+ messages in thread
From: Amit Shah @ 2024-11-01  9:26 UTC (permalink / raw)
  To: kernel test robot, amit.shah, linux-kernel, kvm, x86, linux-doc
  Cc: llvm, oe-kbuild-all, thomas.lendacky, bp, tglx, peterz, jpoimboe,
	pawan.kumar.gupta, corbet, mingo, dave.hansen, hpa, seanjc,
	pbonzini, daniel.sneddon, kai.huang, sandipan.das,
	boris.ostrovsky, Babu.Moger, david.kaplan

On Fri, 2024-11-01 at 12:14 +0800, kernel test robot wrote:
> Hi Amit,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on kvm/queue]
> [also build test WARNING on mst-vhost/linux-next tip/master
> tip/x86/core linus/master v6.12-rc5 next-20241031]
> [cannot apply to kvm/linux-next tip/auto-latest]
> [If your patch is applied to the wrong git tree, kindly drop us a
> note.
> And when submitting patch, we suggest to use '--base' as documented
> in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:   
> https://github.com/intel-lab-lkp/linux/commits/Amit-Shah/x86-cpu-bugs-add-support-for-AMD-ERAPS-feature/20241031-234332
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
> patch link:   
> https://lore.kernel.org/r/20241031153925.36216-3-amit%40kernel.org
> patch subject: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and
> FLUSH_RAP_ON_VMRUN
> config: x86_64-kexec
> (https://download.01.org/0day-ci/archive/20241101/202411011119.l3yRJp
> ht-lkp@intel.com/config)
> compiler: clang version 19.1.3
> (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b25
> 4b6afab99)
> reproduce (this is a W=1 build):
> (https://download.01.org/0day-ci/archive/20241101/202411011119.l3yRJp
> ht-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new
> version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes:
> > https://lore.kernel.org/oe-kbuild-all/202411011119.l3yRJpht-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
>    In file included from arch/x86/kvm/cpuid.c:13:
>    In file included from include/linux/kvm_host.h:16:
>    In file included from include/linux/mm.h:2213:
>    include/linux/vmstat.h:504:43: warning: arithmetic between
> different enumeration types ('enum zone_stat_item' and 'enum
> numa_stat_item') [-Wenum-enum-conversion]
>      504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>          |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>      505 |                            item];
>          |                            ~~~~
>    include/linux/vmstat.h:511:43: warning: arithmetic between
> different enumeration types ('enum zone_stat_item' and 'enum
> numa_stat_item') [-Wenum-enum-conversion]
>      511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>          |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>      512 |                            NR_VM_NUMA_EVENT_ITEMS +
>          |                            ~~~~~~~~~~~~~~~~~~~~~~
>    include/linux/vmstat.h:518:36: warning: arithmetic between
> different enumeration types ('enum node_stat_item' and 'enum
> lru_list') [-Wenum-enum-conversion]
>      518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; //
> skip "nr_"
>          |                               ~~~~~~~~~~~ ^ ~~~
>    include/linux/vmstat.h:524:43: warning: arithmetic between
> different enumeration types ('enum zone_stat_item' and 'enum
> numa_stat_item') [-Wenum-enum-conversion]
>      524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
>          |                            ~~~~~~~~~~~~~~~~~~~~~ ^
>      525 |                            NR_VM_NUMA_EVENT_ITEMS +
>          |                            ~~~~~~~~~~~~~~~~~~~~~~
> > > arch/x86/kvm/cpuid.c:1362:3: warning: label followed by a
> > > declaration is a C23 extension [-Wc23-extensions]
>     1362 |                 unsigned int ebx_mask = 0;
>          |                 ^
>    5 warnings generated.

[...]

>   1361		case 0x80000021:
> > 1362			unsigned int ebx_mask = 0;
>   1363	
>   1364			entry->ecx = entry->edx = 0;
>   1365			cpuid_entry_override(entry,
> CPUID_8000_0021_EAX);
>   1366	
>   1367			/*
>   1368			 * Bits 23:16 in EBX indicate the size of
> the RSB.
>   1369			 * Expose the value in the hardware to the
> guest.
>   1370			 */
>   1371			if (kvm_cpu_cap_has(X86_FEATURE_ERAPS))
>   1372				ebx_mask |= GENMASK(23, 16);
>   1373	
>   1374			entry->ebx &= ebx_mask;
>   1375			break;

Right - I'll add braces around this case statement.

		Amit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
  2024-10-31 15:39 ` [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Amit Shah
  2024-10-31 23:13   ` Pawan Gupta
  2024-11-01  4:14   ` kernel test robot
@ 2024-11-04  5:18   ` Borislav Petkov
  2024-11-04 11:16     ` Shah, Amit
  2 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2024-11-04  5:18 UTC (permalink / raw)
  To: Amit Shah
  Cc: linux-kernel, kvm, x86, linux-doc, amit.shah, thomas.lendacky,
	tglx, peterz, jpoimboe, pawan.kumar.gupta, corbet, mingo,
	dave.hansen, hpa, seanjc, pbonzini, daniel.sneddon, kai.huang,
	sandipan.das, boris.ostrovsky, Babu.Moger, david.kaplan

On Thu, Oct 31, 2024 at 04:39:25PM +0100, Amit Shah wrote:
> +	if (boot_cpu_has(X86_FEATURE_ERAPS) && npt_enabled)

s/boot_cpu_has/cpu_feature_enabled/g


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-10-31 23:03   ` Pawan Gupta
@ 2024-11-04  8:57     ` Shah, Amit
  2024-11-04 14:52       ` Andrew Cooper
  0 siblings, 1 reply; 24+ messages in thread
From: Shah, Amit @ 2024-11-04  8:57 UTC (permalink / raw)
  To: pawan.kumar.gupta@linux.intel.com
  Cc: corbet@lwn.net, kvm@vger.kernel.org, kai.huang@intel.com,
	jpoimboe@kernel.org, dave.hansen@linux.intel.com,
	Lendacky, Thomas, daniel.sneddon@linux.intel.com,
	boris.ostrovsky@oracle.com, linux-kernel@vger.kernel.org,
	seanjc@google.com, mingo@redhat.com, pbonzini@redhat.com,
	tglx@linutronix.de, Moger, Babu, Das1, Sandipan,
	linux-doc@vger.kernel.org, hpa@zytor.com, peterz@infradead.org,
	bp@alien8.de, Kaplan, David, x86@kernel.org

On Thu, 2024-10-31 at 16:03 -0700, Pawan Gupta wrote:
> On Thu, Oct 31, 2024 at 04:39:24PM +0100, Amit Shah wrote:
> > From: Amit Shah <amit.shah@amd.com>
> > 
> > Remove explicit RET stuffing / filling on VMEXITs and context
> > switches on AMD CPUs with the ERAPS feature (Turin+).
> > 
> > With the Enhanced Return Address Prediction Security feature,  any
> > hardware TLB flush results in flushing of the RSB (aka RAP in AMD
> > spec).
> > This guarantees an RSB flush across context switches.
> 
> Is it that the mov to CR3 triggers the RSB flush?

The INVPCID instruction, that causes the TLB flush, is the trigger
here.

> > Feature documented in AMD PPR 57238.
> 
> I couldn't find ERAPS feature description here, I could only manage
> to find
> the bit position:
> 
> 24 	ERAPS. Read-only. Reset: 1. Indicates support for enhanced
> return
> 	address predictor security.
> 
> Could you please point me to the document/section where this is
> described?

Unfortunately, that's all we have right now in the official
documentation.

I've put up some notes in
https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/

Thanks,
		Amit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-10-31 23:11   ` Dave Hansen
@ 2024-11-04  8:58     ` Shah, Amit
  2024-11-04 16:11       ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Shah, Amit @ 2024-11-04  8:58 UTC (permalink / raw)
  To: kvm@vger.kernel.org, dave.hansen@intel.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On Thu, 2024-10-31 at 16:11 -0700, Dave Hansen wrote:
> On 10/31/24 08:39, Amit Shah wrote:
> ...
> > With the Enhanced Return Address Prediction Security feature,  any
> > hardware TLB flush results in flushing of the RSB (aka RAP in AMD
> > spec).
> > This guarantees an RSB flush across context switches. 
> 
> Check out the APM, volume 2: "5.5.1 Process Context Identifier"
> 
> 	... when system software switches address spaces (by writing
> ...
> 	CR3[62:12]), the processor may use TLB mappings previously
> 	stored for that address space and PCID, providing that bit
> 63 of
> 	the source operand is set to 1.
> 
> tl;dr: PCIDs mean you don't necessarily flush the TLB on context
> switches.

Right - thanks, I'll have to reword that to say the RSB is flushed
along with the TLB - so any action that causes the TLB to be flushed
will also cause the RSB to be flushed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN
  2024-11-04  5:18   ` Borislav Petkov
@ 2024-11-04 11:16     ` Shah, Amit
  0 siblings, 0 replies; 24+ messages in thread
From: Shah, Amit @ 2024-11-04 11:16 UTC (permalink / raw)
  To: bp@alien8.de
  Cc: corbet@lwn.net, pawan.kumar.gupta@linux.intel.com,
	kai.huang@intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, linux-doc@vger.kernel.org, hpa@zytor.com,
	peterz@infradead.org, boris.ostrovsky@oracle.com, Kaplan, David,
	x86@kernel.org

On Mon, 2024-11-04 at 06:18 +0100, Borislav Petkov wrote:
> On Thu, Oct 31, 2024 at 04:39:25PM +0100, Amit Shah wrote:
> > +	if (boot_cpu_has(X86_FEATURE_ERAPS) && npt_enabled)
> 
> s/boot_cpu_has/cpu_feature_enabled/g

ACK


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04  8:57     ` Shah, Amit
@ 2024-11-04 14:52       ` Andrew Cooper
  2024-11-04 15:00         ` Shah, Amit
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Cooper @ 2024-11-04 14:52 UTC (permalink / raw)
  To: amit.shah
  Cc: Babu.Moger, David.Kaplan, Sandipan.Das, Thomas.Lendacky,
	boris.ostrovsky, bp, corbet, daniel.sneddon, dave.hansen, hpa,
	jpoimboe, kai.huang, kvm, linux-doc, linux-kernel, mingo,
	pawan.kumar.gupta, pbonzini, peterz, seanjc, tglx, x86

> Unfortunately, that's all we have right now in the official
> documentation.
>
> I've put up some notes in
> https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/

I appreciate the attempt to get a few details out, but this is very
confused on bunch of details.

Most importantly, you've described Intel RSB underflows, but named it
AMD BTC.

"Retbleed" is two totally different things.   I begged the discoverers
to give it two names, and I also begged the x86 maintainers to not alias
them in Linux's view of the world, but alas.

AMD's BTC comes from a bad branch type prediction, and a late resteer
from the ret uop executing.   It has nothing to do with RAS/RSB
underflow conditions.

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 14:52       ` Andrew Cooper
@ 2024-11-04 15:00         ` Shah, Amit
  0 siblings, 0 replies; 24+ messages in thread
From: Shah, Amit @ 2024-11-04 15:00 UTC (permalink / raw)
  To: andrew.cooper3@citrix.com
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	jpoimboe@kernel.org, dave.hansen@linux.intel.com,
	daniel.sneddon@linux.intel.com, Lendacky, Thomas,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	mingo@redhat.com, seanjc@google.com, pbonzini@redhat.com,
	pawan.kumar.gupta@linux.intel.com, Moger, Babu, Das1, Sandipan,
	linux-doc@vger.kernel.org, hpa@zytor.com, peterz@infradead.org,
	bp@alien8.de, Kaplan, David, tglx@linutronix.de, x86@kernel.org

On Mon, 2024-11-04 at 14:52 +0000, Andrew Cooper wrote:
> > Unfortunately, that's all we have right now in the official
> > documentation.
> > 
> > I've put up some notes in
> > https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/
> 
> I appreciate the attempt to get a few details out, but this is very
> confused on bunch of details.
> 
> Most importantly, you've described Intel RSB underflows, but named it
> AMD BTC.
> 
> "Retbleed" is two totally different things.   I begged the
> discoverers
> to give it two names, and I also begged the x86 maintainers to not
> alias
> them in Linux's view of the world, but alas.
> 
> AMD's BTC comes from a bad branch type prediction, and a late resteer
> from the ret uop executing.   It has nothing to do with RAS/RSB
> underflow conditions.

BTC indeed is only branch-type confusion.  The point I wanted to make
there is that to entirely get rid of X86_FEATURE_RSB_CTXW, I had to
confirm that AMD CPUs do not speculate return addresses from the BTB or
BHB since BTC was fixed.  (Or, in other words, to clarify the previous
comments there that said that AMD predicts from the BTB/BHB in every
case).

So - the only point in saying BTC_NO is relevant here is me confirming
that AMD is not going to speculate return addresses from outside of the
RSB. And that comment can now reflect reality.

		Amit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04  8:58     ` Shah, Amit
@ 2024-11-04 16:11       ` Dave Hansen
  2024-11-04 16:13         ` Shah, Amit
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2024-11-04 16:11 UTC (permalink / raw)
  To: Shah, Amit, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On 11/4/24 00:58, Shah, Amit wrote:
> Right - thanks, I'll have to reword that to say the RSB is flushed
> along with the TLB - so any action that causes the TLB to be flushed
> will also cause the RSB to be flushed.

Hold on though.

Is there a need for the RSB to be flushed at context switch?  You talked
about it like there was a need:

> any hardware TLB flush results in flushing of the RSB (aka RAP in
> AMD spec). This guarantees an RSB flush across context switches.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 16:11       ` Dave Hansen
@ 2024-11-04 16:13         ` Shah, Amit
  2024-11-04 16:26           ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Shah, Amit @ 2024-11-04 16:13 UTC (permalink / raw)
  To: kvm@vger.kernel.org, dave.hansen@intel.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On Mon, 2024-11-04 at 08:11 -0800, Dave Hansen wrote:
> On 11/4/24 00:58, Shah, Amit wrote:
> > Right - thanks, I'll have to reword that to say the RSB is flushed
> > along with the TLB - so any action that causes the TLB to be
> > flushed
> > will also cause the RSB to be flushed.
> 
> Hold on though.
> 
> Is there a need for the RSB to be flushed at context switch?  You
> talked
> about it like there was a need:
> 
> > any hardware TLB flush results in flushing of the RSB (aka RAP in
> > AMD spec). This guarantees an RSB flush across context switches.

I want to justify that not setting X86_FEATURE_RSB_CTXSW is still doing
the right thing, albeit in hardware.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 16:13         ` Shah, Amit
@ 2024-11-04 16:26           ` Dave Hansen
  2024-11-04 17:22             ` Shah, Amit
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2024-11-04 16:26 UTC (permalink / raw)
  To: Shah, Amit, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On 11/4/24 08:13, Shah, Amit wrote:
> I want to justify that not setting X86_FEATURE_RSB_CTXSW is still doing
> the right thing, albeit in hardware.

Let's back up a bit.

In the kernel, we have security concerns if RSB contents remain across
context switches.  If process A's RSB entries are left and then process
B uses them, there's a problem.

Today, we mitigate that issue with manual kernel RSB state zapping on
context switches (X86_FEATURE_RSB_CTXSW).

You're saying that this fancy new ERAPS feature includes a new mechanism
to zap RSB state.  But that only triggers "each time a TLB flush happens".

So what you're saying above is that you are concerned about RSB contents
sticking around across context switches.  But instead of using
X86_FEATURE_RSB_CTXSW, you believe that the new TLB-flush-triggered
ERAPS flush can be used instead.

Are we all on the same page so far?

I think you're wrong.  We can't depend on ERAPS for this.  Linux doesn't
flush the TLB on context switches when PCIDs are in play.  Thus, ERAPS
won't flush the RSB and will leave bad state in there and will leave the
system vulnerable.

Or what am I missing?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 16:26           ` Dave Hansen
@ 2024-11-04 17:22             ` Shah, Amit
  2024-11-04 17:45               ` Dave Hansen
  0 siblings, 1 reply; 24+ messages in thread
From: Shah, Amit @ 2024-11-04 17:22 UTC (permalink / raw)
  To: kvm@vger.kernel.org, dave.hansen@intel.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On Mon, 2024-11-04 at 08:26 -0800, Dave Hansen wrote:
> On 11/4/24 08:13, Shah, Amit wrote:
> > I want to justify that not setting X86_FEATURE_RSB_CTXSW is still
> > doing
> > the right thing, albeit in hardware.
> 
> Let's back up a bit.
> 
> In the kernel, we have security concerns if RSB contents remain
> across
> context switches.  If process A's RSB entries are left and then
> process
> B uses them, there's a problem.
> 
> Today, we mitigate that issue with manual kernel RSB state zapping on
> context switches (X86_FEATURE_RSB_CTXSW).
> 
> You're saying that this fancy new ERAPS feature includes a new
> mechanism
> to zap RSB state.  But that only triggers "each time a TLB flush
> happens".
> 
> So what you're saying above is that you are concerned about RSB
> contents
> sticking around across context switches.  But instead of using
> X86_FEATURE_RSB_CTXSW, you believe that the new TLB-flush-triggered
> ERAPS flush can be used instead.
> 
> Are we all on the same page so far?

All good so far.

> I think you're wrong.  We can't depend on ERAPS for this.  Linux
> doesn't
> flush the TLB on context switches when PCIDs are in play.  Thus,
> ERAPS
> won't flush the RSB and will leave bad state in there and will leave
> the
> system vulnerable.
> 
> Or what am I missing?

I just received confirmation from our hardware engineers on this too:

1. the RSB is flushed when CR3 is updated
2. the RSB is flushed when INVPCID is issued (except type 0 - single
address).

I didn't mention 1. so far, which led to your question, right?  Does
this now cover all the cases?

		Amit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 17:22             ` Shah, Amit
@ 2024-11-04 17:45               ` Dave Hansen
  2024-11-05  0:04                 ` Andrew Cooper
  2024-11-05 10:39                 ` Shah, Amit
  0 siblings, 2 replies; 24+ messages in thread
From: Dave Hansen @ 2024-11-04 17:45 UTC (permalink / raw)
  To: Shah, Amit, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On 11/4/24 09:22, Shah, Amit wrote:
>> I think you're wrong.  We can't depend on ERAPS for this.  Linux 
>> doesn't flush the TLB on context switches when PCIDs are in play.
>> Thus, ERAPS won't flush the RSB and will leave bad state in there
>> and will leave the system vulnerable.
>>
>> Or what am I missing?
> I just received confirmation from our hardware engineers on this too:
> 
> 1. the RSB is flushed when CR3 is updated
> 2. the RSB is flushed when INVPCID is issued (except type 0 - single
> address).
> 
> I didn't mention 1. so far, which led to your question, right?  

Not only did you not mention it, you said something _completely_
different.  So, where the documentation for this thing?  I dug through
the 57230 .zip file and I see the CPUID bit:

	24 ERAPS. Read-only. Reset: 1. Indicates support for enhanced
		  return address predictor security.

but nothing telling us how it works.

> Does this now cover all the cases?

Nope, it's worse than I thought.  Look at:

> SYM_FUNC_START(__switch_to_asm)
...
>         FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW

which does the RSB fill at the same time it switches RSP.

So we feel the need to flush the RSB on *ALL* task switches.  That
includes switches between threads in a process *AND* switches over to
kernel threads from user ones.

So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW zaps the
RSB whenever RSP is updated to a new task stack.  Please convince me
that ERAPS provides superior coverage or is unnecessary in all the
possible combinations switching between:

	different thread, same mm
	user=>kernel, same mm
	kernel=>user, same mm
	different mm (we already covered this)

Because several of those switches can happen without a CR3 write or INVPCID.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 17:45               ` Dave Hansen
@ 2024-11-05  0:04                 ` Andrew Cooper
  2024-11-05 10:39                 ` Shah, Amit
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Cooper @ 2024-11-05  0:04 UTC (permalink / raw)
  To: dave.hansen
  Cc: Amit.Shah, Babu.Moger, David.Kaplan, Sandipan.Das,
	Thomas.Lendacky, boris.ostrovsky, bp, corbet, daniel.sneddon,
	dave.hansen, hpa, jpoimboe, kai.huang, kvm, linux-doc,
	linux-kernel, mingo, pawan.kumar.gupta, pbonzini, peterz, seanjc,
	tglx, x86

> So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW zaps the
> RSB whenever RSP is updated to a new task stack.  Please convince me
> that ERAPS provides superior coverage or is unnecessary in all the
> possible combinations switching between:
>
> 	different thread, same mm
> 	user=>kernel, same mm
> 	kernel=>user, same mm
> 	different mm (we already covered this)
>
> Because several of those switches can happen without a CR3 write or INVPCID.

user=>kernel=>user, same mm explicitly does not want to flush the RAS,
because if the system call is shallow enough, some of the userspace RAS
is still intact on when you get back into user mode.

The case which I expect will go wrong is user=>kernel=>different kthread
because this stays on the same mm.

That does need to flush the RAS and won't hit any TLB maintenance
instructions that I'm aware of.

~Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-04 17:45               ` Dave Hansen
  2024-11-05  0:04                 ` Andrew Cooper
@ 2024-11-05 10:39                 ` Shah, Amit
  2024-11-05 14:54                   ` Kaplan, David
  2024-11-05 16:19                   ` Dave Hansen
  1 sibling, 2 replies; 24+ messages in thread
From: Shah, Amit @ 2024-11-05 10:39 UTC (permalink / raw)
  To: kvm@vger.kernel.org, dave.hansen@intel.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On Mon, 2024-11-04 at 09:45 -0800, Dave Hansen wrote:
> On 11/4/24 09:22, Shah, Amit wrote:
> > > I think you're wrong.  We can't depend on ERAPS for this.  Linux 
> > > doesn't flush the TLB on context switches when PCIDs are in play.
> > > Thus, ERAPS won't flush the RSB and will leave bad state in there
> > > and will leave the system vulnerable.
> > > 
> > > Or what am I missing?
> > I just received confirmation from our hardware engineers on this
> > too:
> > 
> > 1. the RSB is flushed when CR3 is updated
> > 2. the RSB is flushed when INVPCID is issued (except type 0 -
> > single
> > address).
> > 
> > I didn't mention 1. so far, which led to your question, right?  
> 
> Not only did you not mention it, you said something _completely_
> different.  So, where the documentation for this thing?  I dug
> through
> the 57230 .zip file and I see the CPUID bit:
> 
> 	24 ERAPS. Read-only. Reset: 1. Indicates support for
> enhanced
> 		  return address predictor security.
> 
> but nothing telling us how it works.

I'm expecting the APM update come out soon, but I have put together

https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/

based on information I have.  I think it's mostly consistent with what
I've said so far - with the exception of the mov-CR3 flush only
confirmed yesterday.

> > Does this now cover all the cases?
> 
> Nope, it's worse than I thought.  Look at:
> 
> > SYM_FUNC_START(__switch_to_asm)
> ...
> >         FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS,
> > X86_FEATURE_RSB_CTXSW
> 
> which does the RSB fill at the same time it switches RSP.
> 
> So we feel the need to flush the RSB on *ALL* task switches.  That
> includes switches between threads in a process *AND* switches over to
> kernel threads from user ones.

(since these cases are the same as those listed below, I'll only reply
in one place)

> So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW zaps
> the
> RSB whenever RSP is updated to a new task stack.  Please convince me
> that ERAPS provides superior coverage or is unnecessary in all the
> possible combinations switching between:
> 
> 	different thread, same mm

This case is the same userspace process with valid addresses in the RSB
for that process.  An invalid speculation isn't security sensitive,
just a misprediction that won't be retired.  So we are good here.

>	user=>kernel, same mm
>	kernel=>user, same mm

user-kernel is protected with SMEP.  Also, we don't call
FILL_RETURN_BUFFER for these switches?

> 	different mm (we already covered this)
> 
> Because several of those switches can happen without a CR3 write or
> INVPCID.


(that covers all of them IIRC)

		Amit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-05 10:39                 ` Shah, Amit
@ 2024-11-05 14:54                   ` Kaplan, David
  2024-11-05 16:19                   ` Dave Hansen
  1 sibling, 0 replies; 24+ messages in thread
From: Kaplan, David @ 2024-11-05 14:54 UTC (permalink / raw)
  To: Shah, Amit, kvm@vger.kernel.org, dave.hansen@intel.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Andrew Cooper

[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Shah, Amit <Amit.Shah@amd.com>
> Sent: Tuesday, November 5, 2024 4:40 AM
> To: kvm@vger.kernel.org; dave.hansen@intel.com; linux-kernel@vger.kernel.org;
> linux-doc@vger.kernel.org; x86@kernel.org
> Cc: corbet@lwn.net; boris.ostrovsky@oracle.com; kai.huang@intel.com;
> pawan.kumar.gupta@linux.intel.com; jpoimboe@kernel.org;
> dave.hansen@linux.intel.com; daniel.sneddon@linux.intel.com; Lendacky, Thomas
> <Thomas.Lendacky@amd.com>; seanjc@google.com; mingo@redhat.com;
> pbonzini@redhat.com; tglx@linutronix.de; Moger, Babu <Babu.Moger@amd.com>;
> Das1, Sandipan <Sandipan.Das@amd.com>; hpa@zytor.com;
> peterz@infradead.org; bp@alien8.de; Kaplan, David <David.Kaplan@amd.com>
> Subject: Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
>
> On Mon, 2024-11-04 at 09:45 -0800, Dave Hansen wrote:
> > On 11/4/24 09:22, Shah, Amit wrote:
> > > > I think you're wrong.  We can't depend on ERAPS for this.  Linux
> > > > doesn't flush the TLB on context switches when PCIDs are in play.
> > > > Thus, ERAPS won't flush the RSB and will leave bad state in there
> > > > and will leave the system vulnerable.
> > > >
> > > > Or what am I missing?
> > > I just received confirmation from our hardware engineers on this
> > > too:
> > >
> > > 1. the RSB is flushed when CR3 is updated 2. the RSB is flushed when
> > > INVPCID is issued (except type 0 - single address).
> > >
> > > I didn't mention 1. so far, which led to your question, right?
> >
> > Not only did you not mention it, you said something _completely_
> > different.  So, where the documentation for this thing?  I dug through
> > the 57230 .zip file and I see the CPUID bit:
> >
> >     24 ERAPS. Read-only. Reset: 1. Indicates support for enhanced
> >               return address predictor security.
> >
> > but nothing telling us how it works.
>
> I'm expecting the APM update come out soon, but I have put together
>
> https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/
>
> based on information I have.  I think it's mostly consistent with what I've said so far -
> with the exception of the mov-CR3 flush only confirmed yesterday.
>
> > > Does this now cover all the cases?
> >
> > Nope, it's worse than I thought.  Look at:
> >
> > > SYM_FUNC_START(__switch_to_asm)
> > ...
> > >         FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS,
> > > X86_FEATURE_RSB_CTXSW
> >
> > which does the RSB fill at the same time it switches RSP.
> >
> > So we feel the need to flush the RSB on *ALL* task switches.  That
> > includes switches between threads in a process *AND* switches over to
> > kernel threads from user ones.
>
> (since these cases are the same as those listed below, I'll only reply in one place)
>
> > So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW zaps the
> > RSB whenever RSP is updated to a new task stack.  Please convince me
> > that ERAPS provides superior coverage or is unnecessary in all the
> > possible combinations switching between:
> >
> >     different thread, same mm
>
> This case is the same userspace process with valid addresses in the RSB for that
> process.  An invalid speculation isn't security sensitive, just a misprediction that
> won't be retired.  So we are good here.

I think it's more accurate to say that the addresses in the RSB may be incorrect for the new thread, but they're still valid return sites since it's the same mm.  There are other existing cases (such as deep call trees) where RSB predictions for a thread may be incorrect but are still to valid return sites.

We could still trigger a RSB flush here I suppose, but I think there's an argument it is unnecessary.

--David Kaplan

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-05 10:39                 ` Shah, Amit
  2024-11-05 14:54                   ` Kaplan, David
@ 2024-11-05 16:19                   ` Dave Hansen
  2024-11-05 16:25                     ` Shah, Amit
  1 sibling, 1 reply; 24+ messages in thread
From: Dave Hansen @ 2024-11-05 16:19 UTC (permalink / raw)
  To: Shah, Amit, kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On 11/5/24 02:39, Shah, Amit wrote:
> On Mon, 2024-11-04 at 09:45 -0800, Dave Hansen wrote:
> I'm expecting the APM update come out soon, but I have put together
> 
> https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/
> 
> based on information I have.  I think it's mostly consistent with what
> I've said so far - with the exception of the mov-CR3 flush only
> confirmed yesterday.

That's better.  But your original cover letter did say:

	Feature documented in AMD PPR 57238.

which is technically true because the _bit_ is defined.  But it's far,
far from being sufficiently documented for Linux to actually use it.

Could we please be more careful about these in the future?

>> So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW zaps
>> the
>> RSB whenever RSP is updated to a new task stack.  Please convince me
>> that ERAPS provides superior coverage or is unnecessary in all the
>> possible combinations switching between:
>>
>> 	different thread, same mm
> 
> This case is the same userspace process with valid addresses in the RSB
> for that process.  An invalid speculation isn't security sensitive,
> just a misprediction that won't be retired.  So we are good here.

Does that match what the __switch_to_asm comment says, though?

>         /*
>          * When switching from a shallower to a deeper call stack
>          * the RSB may either underflow or use entries populated
>          * with userspace addresses. On CPUs where those concerns
>          * exist, overwrite the RSB with entries which capture
>          * speculative execution to prevent attack.
>          */

It is also talking just about call depth, not about same-address-space
RSB entries being harmless.  That's because this is also trying to avoid
having the kernel consume any user-placed RSB entries, regardless of
whether they're from the same mm or not.

>> 	user=>kernel, same mm
>> 	kernel=>user, same mm
> 
> user-kernel is protected with SMEP.  Also, we don't call
> FILL_RETURN_BUFFER for these switches?

Amit, I'm beginning to fear that you haven't gone and looked at the
relevant code here.  Please go look at SYM_FUNC_START(__switch_to_asm)
in arch/x86/entry/entry_64.S.  I believe this code is called for all
task switches, including switching from a user task to a kernel task.  I
also believe that FILL_RETURN_BUFFER is used unconditionally for every
__switch_to_asm call (when X86_FEATURE_RSB_CTXSW is on of course).

Could we please start over on this patch?

Let's get the ERAPS+TLB-flush nonsense out of the kernel and get the
commit message right.

Then let's go from there.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature
  2024-11-05 16:19                   ` Dave Hansen
@ 2024-11-05 16:25                     ` Shah, Amit
  0 siblings, 0 replies; 24+ messages in thread
From: Shah, Amit @ 2024-11-05 16:25 UTC (permalink / raw)
  To: kvm@vger.kernel.org, dave.hansen@intel.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	x86@kernel.org
  Cc: corbet@lwn.net, boris.ostrovsky@oracle.com, kai.huang@intel.com,
	pawan.kumar.gupta@linux.intel.com, jpoimboe@kernel.org,
	dave.hansen@linux.intel.com, daniel.sneddon@linux.intel.com,
	Lendacky, Thomas, seanjc@google.com, mingo@redhat.com,
	pbonzini@redhat.com, tglx@linutronix.de, Moger, Babu,
	Das1, Sandipan, hpa@zytor.com, peterz@infradead.org, bp@alien8.de,
	Kaplan, David

On Tue, 2024-11-05 at 08:19 -0800, Dave Hansen wrote:
> On 11/5/24 02:39, Shah, Amit wrote:
> > On Mon, 2024-11-04 at 09:45 -0800, Dave Hansen wrote:
> > I'm expecting the APM update come out soon, but I have put together
> > 
> > https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/
> > 
> > based on information I have.  I think it's mostly consistent with
> > what
> > I've said so far - with the exception of the mov-CR3 flush only
> > confirmed yesterday.
> 
> That's better.  But your original cover letter did say:
> 
> 	Feature documented in AMD PPR 57238.
> 
> which is technically true because the _bit_ is defined.  But it's
> far,
> far from being sufficiently documented for Linux to actually use it.

Yea; apologies.

> Could we please be more careful about these in the future?
> 
> > > So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW
> > > zaps
> > > the
> > > RSB whenever RSP is updated to a new task stack.  Please convince
> > > me
> > > that ERAPS provides superior coverage or is unnecessary in all
> > > the
> > > possible combinations switching between:
> > > 
> > > 	different thread, same mm
> > 
> > This case is the same userspace process with valid addresses in the
> > RSB
> > for that process.  An invalid speculation isn't security sensitive,
> > just a misprediction that won't be retired.  So we are good here.
> 
> Does that match what the __switch_to_asm comment says, though?
> 
> >         /*
> >          * When switching from a shallower to a deeper call stack
> >          * the RSB may either underflow or use entries populated
> >          * with userspace addresses. On CPUs where those concerns
> >          * exist, overwrite the RSB with entries which capture
> >          * speculative execution to prevent attack.
> >          */
> 
> It is also talking just about call depth, not about same-address-
> space
> RSB entries being harmless.  That's because this is also trying to
> avoid
> having the kernel consume any user-placed RSB entries, regardless of
> whether they're from the same mm or not.
> 
> > > 	user=>kernel, same mm
> > > 	kernel=>user, same mm
> > 
> > user-kernel is protected with SMEP.  Also, we don't call
> > FILL_RETURN_BUFFER for these switches?
> 
> Amit, I'm beginning to fear that you haven't gone and looked at the
> relevant code here.  Please go look at
> SYM_FUNC_START(__switch_to_asm)
> in arch/x86/entry/entry_64.S.  I believe this code is called for all
> task switches, including switching from a user task to a kernel
> task.  I
> also believe that FILL_RETURN_BUFFER is used unconditionally for
> every
> __switch_to_asm call (when X86_FEATURE_RSB_CTXSW is on of course).
> 
> Could we please start over on this patch?
> 
> Let's get the ERAPS+TLB-flush nonsense out of the kernel and get the
> commit message right.
> 
> Then let's go from there.

Alright - you've been really patient, so thanks for that.  I agree I'll
post a v2 with updated commit messages, and then continue this
discussion on user/kernel task switch.  And I'll also add an RFC tag to
it to ensure it doesn't get picked up.

		Amit

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-11-05 16:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 15:39 [PATCH 0/2] Add support for the ERAPS feature Amit Shah
2024-10-31 15:39 ` [PATCH 1/2] x86: cpu/bugs: add support for AMD " Amit Shah
2024-10-31 23:03   ` Pawan Gupta
2024-11-04  8:57     ` Shah, Amit
2024-11-04 14:52       ` Andrew Cooper
2024-11-04 15:00         ` Shah, Amit
2024-10-31 23:11   ` Dave Hansen
2024-11-04  8:58     ` Shah, Amit
2024-11-04 16:11       ` Dave Hansen
2024-11-04 16:13         ` Shah, Amit
2024-11-04 16:26           ` Dave Hansen
2024-11-04 17:22             ` Shah, Amit
2024-11-04 17:45               ` Dave Hansen
2024-11-05  0:04                 ` Andrew Cooper
2024-11-05 10:39                 ` Shah, Amit
2024-11-05 14:54                   ` Kaplan, David
2024-11-05 16:19                   ` Dave Hansen
2024-11-05 16:25                     ` Shah, Amit
2024-10-31 15:39 ` [PATCH 2/2] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Amit Shah
2024-10-31 23:13   ` Pawan Gupta
2024-11-01  4:14   ` kernel test robot
2024-11-01  9:26     ` Amit Shah
2024-11-04  5:18   ` Borislav Petkov
2024-11-04 11:16     ` Shah, Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).