[PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups
@ 2023-05-24  6:16 Chao Gao
  2023-05-24  6:16 ` [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Chao Gao
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Chao Gao @ 2023-05-24  6:16 UTC (permalink / raw)
  To: kvm, x86
  Cc: xiaoyao.li, Chao Gao, Alexandre Chartre, Ashok Raj,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Ingo Molnar,
	Juergen Gross, Kim Phillips, linux-kernel, Paolo Bonzini,
	Pawan Gupta, Peter Zijlstra, Sean Christopherson, Thomas Gleixner

Do some cleanups about MSR_IA32_ARCH_CAPABILITIES in KVM e.g, fix stale
comments, elimite open-coded MSR reads, track features that can be exposed to
guests to avoid computing them runtime. They are found when reading the code.

No functional change intented.

Changes from v1 -> v2:
1. toggle ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf mitigation state is
   changed by module param. [Xiaoyao, Sean]
2. add more cleanups (patch 2/3)


Chao Gao (3):
  KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps
  KVM: x86: Correct the name for skipping VMENTER l1d flush
  x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES

 arch/x86/kernel/cpu/common.c |  1 +
 arch/x86/kvm/vmx/vmx.c       | 44 +++++++++++++++++++++++-------------
 arch/x86/kvm/x86.c           | 16 +++++--------
 arch/x86/kvm/x86.h           |  1 +
 4 files changed, 36 insertions(+), 26 deletions(-)


base-commit: b9846a698c9aff4eb2214a06ac83638ad098f33f
-- 
2.40.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps
  2023-05-24  6:16 [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Chao Gao
@ 2023-05-24  6:16 ` Chao Gao
  2023-05-24  8:14   ` Xiaoyao Li
  2023-06-06 16:40   ` Sean Christopherson
  2023-05-24  6:16 ` [PATCH v2 2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush Chao Gao
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Chao Gao @ 2023-05-24  6:16 UTC (permalink / raw)
  To: kvm, x86
  Cc: xiaoyao.li, Chao Gao, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel

to avoid computing the supported value at runtime every time.

Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation
is modified to achieve the same result as runtime computing.

Opportunistically, add a comment to document the problem of allowing
changing the supported value of ARCH_CAPABILITIES and the reason why
we don't fix it.

No functional change intended.

Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/
Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++--
 arch/x86/kvm/x86.c     |  7 ++++---
 arch/x86/kvm/x86.h     |  1 +
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..8274ef5e89e5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 
 	l1tf_vmx_mitigation = l1tf;
 
-	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
+	/*
+	 * Update static keys and supported arch capabilities according to
+	 * the new mitigation state.
+	 *
+	 * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache
+	 * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1
+	 * guests can skip the flush and if we don't, then L1 guests need
+	 * to do a flush.
+	 *
+	 * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent
+	 * model to the guest, e.g., if userspace isn't careful, a VM can
+	 * have vCPUs with different values for ARCH_CAPABILITIES. But
+	 * there is almost no chance to fix the issue. Because, to present
+	 * a consistent model, KVM essentially needs to disallow changing
+	 * the module param after VMs/vCPUs have been created, but that
+	 * would prevent userspace from toggling the param while VMs are
+	 * running, e.g., in response to a new vulnerability.
+	 */
+	if (l1tf != VMENTER_L1D_FLUSH_NEVER) {
 		static_branch_enable(&vmx_l1d_should_flush);
-	else
+		kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
+	} else {
 		static_branch_disable(&vmx_l1d_should_flush);
+		kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
+	}
 
 	if (l1tf == VMENTER_L1D_FLUSH_COND)
 		static_branch_enable(&vmx_l1d_flush_cond);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c0778ca39650..2408b5f554b7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
 	case MSR_IA32_ARCH_CAPABILITIES:
-		msr->data = kvm_get_arch_capabilities();
+		msr->data = kvm_caps.supported_arch_cap;
 		break;
 	case MSR_IA32_PERF_CAPABILITIES:
 		msr->data = kvm_caps.supported_perf_cap;
@@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index)
 			return;
 		break;
 	case MSR_IA32_TSX_CTRL:
-		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
+		if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR))
 			return;
 		break;
 	default:
@@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		kvm_caps.max_guest_tsc_khz = max;
 	}
 	kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
+	kvm_caps.supported_arch_cap = kvm_get_arch_capabilities();
 	kvm_init_msr_lists();
 	return 0;
 
@@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	if (r)
 		goto free_guest_fpu;
 
-	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
+	vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap;
 	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
 	kvm_xen_init_vcpu(vcpu);
 	kvm_vcpu_mtrr_init(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index c544602d07a3..d3e524bcc169 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -29,6 +29,7 @@ struct kvm_caps {
 	u64 supported_xcr0;
 	u64 supported_xss;
 	u64 supported_perf_cap;
+	u64 supported_arch_cap;
 };
 
 void kvm_spurious_fault(void);
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush
  2023-05-24  6:16 [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Chao Gao
  2023-05-24  6:16 ` [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Chao Gao
@ 2023-05-24  6:16 ` Chao Gao
  2023-05-24  8:25   ` Xiaoyao Li
  2023-05-24  6:16 ` [PATCH v2 3/3] x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES Chao Gao
  2023-06-06 17:26 ` [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Sean Christopherson
  3 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2023-05-24  6:16 UTC (permalink / raw)
  To: kvm, x86
  Cc: xiaoyao.li, Chao Gao, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, linux-kernel

There is no VMENTER_L1D_FLUSH_NESTED_VM. It should be
ARCH_CAP_SKIP_VMENTRY_L1DFLUSH.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kvm/x86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2408b5f554b7..f7838260c183 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1631,7 +1631,7 @@ static u64 kvm_get_arch_capabilities(void)
 	 * If we're doing cache flushes (either "always" or "cond")
 	 * we will do one whenever the guest does a vmlaunch/vmresume.
 	 * If an outer hypervisor is doing the cache flush for us
-	 * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that
+	 * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
 	 * capability to the guest too, and if EPT is disabled we're not
 	 * vulnerable.  Overall, only VMENTER_L1D_FLUSH_NEVER will
 	 * require a nested hypervisor to do a flush of its own.
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES
  2023-05-24  6:16 [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Chao Gao
  2023-05-24  6:16 ` [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Chao Gao
  2023-05-24  6:16 ` [PATCH v2 2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush Chao Gao
@ 2023-05-24  6:16 ` Chao Gao
  2023-05-24  8:19   ` Xiaoyao Li
  2023-06-06 17:26 ` [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Sean Christopherson
  3 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2023-05-24  6:16 UTC (permalink / raw)
  To: kvm, x86
  Cc: xiaoyao.li, Chao Gao, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Sean Christopherson,
	Paolo Bonzini, Peter Zijlstra, Kim Phillips, Juergen Gross,
	Ashok Raj, Pawan Gupta, Alexandre Chartre, linux-kernel

KVM open-codes x86_read_arch_cap_msr() in a few places. Eliminate them
by exposing the helper function and using it directly.

No functional change intended.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 arch/x86/kernel/cpu/common.c |  1 +
 arch/x86/kvm/vmx/vmx.c       | 19 +++++--------------
 arch/x86/kvm/x86.c           |  7 +------
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 80710a68ef7d..b34dd3f3e6c4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1315,6 +1315,7 @@ u64 x86_read_arch_cap_msr(void)
 
 	return ia32_cap;
 }
+EXPORT_SYMBOL_GPL(x86_read_arch_cap_msr);
 
 static bool arch_cap_mmio_immune(u64 ia32_cap)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8274ef5e89e5..3dec4a4f637e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -255,14 +255,9 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 		return 0;
 	}
 
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
-		u64 msr;
-
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
-		if (msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
-			l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
-			return 0;
-		}
+	if (x86_read_arch_cap_msr() & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
+		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
+		return 0;
 	}
 
 	/* If set to auto use the default l1tf mitigation method */
@@ -394,13 +389,9 @@ static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
 
 static void vmx_setup_fb_clear_ctrl(void)
 {
-	u64 msr;
-
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) &&
-	    !boot_cpu_has_bug(X86_BUG_MDS) &&
+	if (!boot_cpu_has_bug(X86_BUG_MDS) &&
 	    !boot_cpu_has_bug(X86_BUG_TAA)) {
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
-		if (msr & ARCH_CAP_FB_CLEAR_CTRL)
+		if (x86_read_arch_cap_msr() & ARCH_CAP_FB_CLEAR_CTRL)
 			vmx_fb_clear_ctrl_available = true;
 	}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f7838260c183..b1bdb84ac10f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1612,12 +1612,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
 
 static u64 kvm_get_arch_capabilities(void)
 {
-	u64 data = 0;
-
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
-		data &= KVM_SUPPORTED_ARCH_CAP;
-	}
+	u64 data = x86_read_arch_cap_msr() & KVM_SUPPORTED_ARCH_CAP;
 
 	/*
 	 * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps
  2023-05-24  6:16 ` [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Chao Gao
@ 2023-05-24  8:14   ` Xiaoyao Li
  2023-05-24  8:32     ` Chao Gao
  2023-06-06 16:40   ` Sean Christopherson
  1 sibling, 1 reply; 11+ messages in thread
From: Xiaoyao Li @ 2023-05-24  8:14 UTC (permalink / raw)
  To: Chao Gao, kvm, x86
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, linux-kernel

On 5/24/2023 2:16 PM, Chao Gao wrote:
> to avoid computing the supported value at runtime every time.
> 
> Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation
> is modified to achieve the same result as runtime computing.

It's not the same result.

In kvm_get_arch_capabilities(), host's value is honored. I.e., when host 
supports ARCH_CAP_SKIP_VMENTRY_L1DFLUSH, l1tf_vmx_mitigation doesn't 
make any difference to the result.

> Opportunistically, add a comment to document the problem of allowing
> changing the supported value of ARCH_CAPABILITIES and the reason why
> we don't fix it.
> 
> No functional change intended.
> 
> Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/
> Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>   arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++--
>   arch/x86/kvm/x86.c     |  7 ++++---
>   arch/x86/kvm/x86.h     |  1 +
>   3 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 44fb619803b8..8274ef5e89e5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
>   
>   	l1tf_vmx_mitigation = l1tf;
>   
> -	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
> +	/*
> +	 * Update static keys and supported arch capabilities according to
> +	 * the new mitigation state.
> +	 *
> +	 * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache
> +	 * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1
> +	 * guests can skip the flush and if we don't, then L1 guests need
> +	 * to do a flush.
> +	 *
> +	 * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent
> +	 * model to the guest, e.g., if userspace isn't careful, a VM can
> +	 * have vCPUs with different values for ARCH_CAPABILITIES. But
> +	 * there is almost no chance to fix the issue. Because, to present
> +	 * a consistent model, KVM essentially needs to disallow changing
> +	 * the module param after VMs/vCPUs have been created, but that
> +	 * would prevent userspace from toggling the param while VMs are
> +	 * running, e.g., in response to a new vulnerability.
> +	 */
> +	if (l1tf != VMENTER_L1D_FLUSH_NEVER) {
>   		static_branch_enable(&vmx_l1d_should_flush);
> -	else
> +		kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> +	} else {
>   		static_branch_disable(&vmx_l1d_should_flush);
> +		kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> +	}
>   
>   	if (l1tf == VMENTER_L1D_FLUSH_COND)
>   		static_branch_enable(&vmx_l1d_flush_cond);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c0778ca39650..2408b5f554b7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
>   {
>   	switch (msr->index) {
>   	case MSR_IA32_ARCH_CAPABILITIES:
> -		msr->data = kvm_get_arch_capabilities();
> +		msr->data = kvm_caps.supported_arch_cap;
>   		break;
>   	case MSR_IA32_PERF_CAPABILITIES:
>   		msr->data = kvm_caps.supported_perf_cap;
> @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>   			return;
>   		break;
>   	case MSR_IA32_TSX_CTRL:
> -		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
> +		if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR))
>   			return;
>   		break;
>   	default:
> @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>   		kvm_caps.max_guest_tsc_khz = max;
>   	}
>   	kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
> +	kvm_caps.supported_arch_cap = kvm_get_arch_capabilities();
>   	kvm_init_msr_lists();
>   	return 0;
>   
> @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>   	if (r)
>   		goto free_guest_fpu;
>   
> -	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
> +	vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap;
>   	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>   	kvm_xen_init_vcpu(vcpu);
>   	kvm_vcpu_mtrr_init(vcpu);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index c544602d07a3..d3e524bcc169 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -29,6 +29,7 @@ struct kvm_caps {
>   	u64 supported_xcr0;
>   	u64 supported_xss;
>   	u64 supported_perf_cap;
> +	u64 supported_arch_cap;
>   };
>   
>   void kvm_spurious_fault(void);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 3/3] x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES
  2023-05-24  6:16 ` [PATCH v2 3/3] x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES Chao Gao
@ 2023-05-24  8:19   ` Xiaoyao Li
  0 siblings, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-05-24  8:19 UTC (permalink / raw)
  To: Chao Gao, kvm, x86
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Sean Christopherson, Paolo Bonzini,
	Peter Zijlstra, Kim Phillips, Juergen Gross, Ashok Raj,
	Pawan Gupta, Alexandre Chartre, linux-kernel

On 5/24/2023 2:16 PM, Chao Gao wrote:
> KVM open-codes x86_read_arch_cap_msr() in a few places. Eliminate them
> by exposing the helper function and using it directly.
> 
> No functional change intended.
> 

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>   arch/x86/kernel/cpu/common.c |  1 +
>   arch/x86/kvm/vmx/vmx.c       | 19 +++++--------------
>   arch/x86/kvm/x86.c           |  7 +------
>   3 files changed, 7 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 80710a68ef7d..b34dd3f3e6c4 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1315,6 +1315,7 @@ u64 x86_read_arch_cap_msr(void)
>   
>   	return ia32_cap;
>   }
> +EXPORT_SYMBOL_GPL(x86_read_arch_cap_msr);
>   
>   static bool arch_cap_mmio_immune(u64 ia32_cap)
>   {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 8274ef5e89e5..3dec4a4f637e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -255,14 +255,9 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
>   		return 0;
>   	}
>   
> -	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
> -		u64 msr;
> -
> -		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
> -		if (msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
> -			l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
> -			return 0;
> -		}
> +	if (x86_read_arch_cap_msr() & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
> +		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
> +		return 0;
>   	}
>   
>   	/* If set to auto use the default l1tf mitigation method */
> @@ -394,13 +389,9 @@ static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
>   
>   static void vmx_setup_fb_clear_ctrl(void)
>   {
> -	u64 msr;
> -
> -	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) &&
> -	    !boot_cpu_has_bug(X86_BUG_MDS) &&
> +	if (!boot_cpu_has_bug(X86_BUG_MDS) &&
>   	    !boot_cpu_has_bug(X86_BUG_TAA)) {
> -		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
> -		if (msr & ARCH_CAP_FB_CLEAR_CTRL)
> +		if (x86_read_arch_cap_msr() & ARCH_CAP_FB_CLEAR_CTRL)
>   			vmx_fb_clear_ctrl_available = true;
>   	}
>   }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f7838260c183..b1bdb84ac10f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1612,12 +1612,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
>   
>   static u64 kvm_get_arch_capabilities(void)
>   {
> -	u64 data = 0;
> -
> -	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
> -		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
> -		data &= KVM_SUPPORTED_ARCH_CAP;
> -	}
> +	u64 data = x86_read_arch_cap_msr() & KVM_SUPPORTED_ARCH_CAP;
>   
>   	/*
>   	 * If nx_huge_pages is enabled, KVM's shadow paging will ensure that


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush
  2023-05-24  6:16 ` [PATCH v2 2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush Chao Gao
@ 2023-05-24  8:25   ` Xiaoyao Li
  0 siblings, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-05-24  8:25 UTC (permalink / raw)
  To: Chao Gao, kvm, x86
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, linux-kernel

On 5/24/2023 2:16 PM, Chao Gao wrote:
> There is no VMENTER_L1D_FLUSH_NESTED_VM. It should be
> ARCH_CAP_SKIP_VMENTRY_L1DFLUSH.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---
>   arch/x86/kvm/x86.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2408b5f554b7..f7838260c183 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1631,7 +1631,7 @@ static u64 kvm_get_arch_capabilities(void)
>   	 * If we're doing cache flushes (either "always" or "cond")
>   	 * we will do one whenever the guest does a vmlaunch/vmresume.
>   	 * If an outer hypervisor is doing the cache flush for us
> -	 * (VMENTER_L1D_FLUSH_NESTED_VM), we can safely pass that
> +	 * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
>   	 * capability to the guest too, and if EPT is disabled we're not
>   	 * vulnerable.  Overall, only VMENTER_L1D_FLUSH_NEVER will
>   	 * require a nested hypervisor to do a flush of its own.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps
  2023-05-24  8:14   ` Xiaoyao Li
@ 2023-05-24  8:32     ` Chao Gao
  2023-05-25  2:17       ` Xiaoyao Li
  0 siblings, 1 reply; 11+ messages in thread
From: Chao Gao @ 2023-05-24  8:32 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kvm, x86, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	linux-kernel

On Wed, May 24, 2023 at 04:14:10PM +0800, Xiaoyao Li wrote:
>On 5/24/2023 2:16 PM, Chao Gao wrote:
>> to avoid computing the supported value at runtime every time.
>> 
>> Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation
>> is modified to achieve the same result as runtime computing.
>
>It's not the same result.

it is because ...

>
>In kvm_get_arch_capabilities(), host's value is honored. I.e., when host
>supports ARCH_CAP_SKIP_VMENTRY_L1DFLUSH, l1tf_vmx_mitigation doesn't make any
>difference to the result.

... l1tf_vmx_mitigation should be VMENTER_L1D_FLUSH_NOT_REQUIRED in this
case. l1tf_vmx_mitigation cannot be VMENTER_L1D_FLUSH_NEVER.

>
>> Opportunistically, add a comment to document the problem of allowing
>> changing the supported value of ARCH_CAPABILITIES and the reason why
>> we don't fix it.
>> 
>> No functional change intended.
>> 
>> Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/
>> Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>>   arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++--
>>   arch/x86/kvm/x86.c     |  7 ++++---
>>   arch/x86/kvm/x86.h     |  1 +
>>   3 files changed, 28 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 44fb619803b8..8274ef5e89e5 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
>>   	l1tf_vmx_mitigation = l1tf;
>> -	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
>> +	/*
>> +	 * Update static keys and supported arch capabilities according to
>> +	 * the new mitigation state.
>> +	 *
>> +	 * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache
>> +	 * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1
>> +	 * guests can skip the flush and if we don't, then L1 guests need
>> +	 * to do a flush.
>> +	 *
>> +	 * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent
>> +	 * model to the guest, e.g., if userspace isn't careful, a VM can
>> +	 * have vCPUs with different values for ARCH_CAPABILITIES. But
>> +	 * there is almost no chance to fix the issue. Because, to present
>> +	 * a consistent model, KVM essentially needs to disallow changing
>> +	 * the module param after VMs/vCPUs have been created, but that
>> +	 * would prevent userspace from toggling the param while VMs are
>> +	 * running, e.g., in response to a new vulnerability.
>> +	 */
>> +	if (l1tf != VMENTER_L1D_FLUSH_NEVER) {
>>   		static_branch_enable(&vmx_l1d_should_flush);
>> -	else
>> +		kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>> +	} else {
>>   		static_branch_disable(&vmx_l1d_should_flush);
>> +		kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>> +	}
>>   	if (l1tf == VMENTER_L1D_FLUSH_COND)
>>   		static_branch_enable(&vmx_l1d_flush_cond);
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index c0778ca39650..2408b5f554b7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
>>   {
>>   	switch (msr->index) {
>>   	case MSR_IA32_ARCH_CAPABILITIES:
>> -		msr->data = kvm_get_arch_capabilities();
>> +		msr->data = kvm_caps.supported_arch_cap;
>>   		break;
>>   	case MSR_IA32_PERF_CAPABILITIES:
>>   		msr->data = kvm_caps.supported_perf_cap;
>> @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>>   			return;
>>   		break;
>>   	case MSR_IA32_TSX_CTRL:
>> -		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
>> +		if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR))
>>   			return;
>>   		break;
>>   	default:
>> @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>>   		kvm_caps.max_guest_tsc_khz = max;
>>   	}
>>   	kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
>> +	kvm_caps.supported_arch_cap = kvm_get_arch_capabilities();
>>   	kvm_init_msr_lists();
>>   	return 0;
>> @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>   	if (r)
>>   		goto free_guest_fpu;
>> -	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
>> +	vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap;
>>   	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>>   	kvm_xen_init_vcpu(vcpu);
>>   	kvm_vcpu_mtrr_init(vcpu);
>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>> index c544602d07a3..d3e524bcc169 100644
>> --- a/arch/x86/kvm/x86.h
>> +++ b/arch/x86/kvm/x86.h
>> @@ -29,6 +29,7 @@ struct kvm_caps {
>>   	u64 supported_xcr0;
>>   	u64 supported_xss;
>>   	u64 supported_perf_cap;
>> +	u64 supported_arch_cap;
>>   };
>>   void kvm_spurious_fault(void);
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps
  2023-05-24  8:32     ` Chao Gao
@ 2023-05-25  2:17       ` Xiaoyao Li
  0 siblings, 0 replies; 11+ messages in thread
From: Xiaoyao Li @ 2023-05-25  2:17 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, x86, Sean Christopherson, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, H. Peter Anvin,
	linux-kernel

On 5/24/2023 4:32 PM, Chao Gao wrote:
> On Wed, May 24, 2023 at 04:14:10PM +0800, Xiaoyao Li wrote:
>> On 5/24/2023 2:16 PM, Chao Gao wrote:
>>> to avoid computing the supported value at runtime every time.
>>>
>>> Toggle the ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf_vmx_mitigation
>>> is modified to achieve the same result as runtime computing.
>>
>> It's not the same result.
> 
> it is because ...
> 
>>
>> In kvm_get_arch_capabilities(), host's value is honored. I.e., when host
>> supports ARCH_CAP_SKIP_VMENTRY_L1DFLUSH, l1tf_vmx_mitigation doesn't make any
>> difference to the result.
> 
> ... l1tf_vmx_mitigation should be VMENTER_L1D_FLUSH_NOT_REQUIRED in this
> case. l1tf_vmx_mitigation cannot be VMENTER_L1D_FLUSH_NEVER.

yes. you are right. Maybe we can clarify it in the changelog

anyway,

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>

>>
>>> Opportunistically, add a comment to document the problem of allowing
>>> changing the supported value of ARCH_CAPABILITIES and the reason why
>>> we don't fix it.
>>>
>>> No functional change intended.
>>>
>>> Link: https://lore.kernel.org/all/ZGZhW%2Fx5OWPmx1qD@google.com/
>>> Link: https://lore.kernel.org/all/ZGeU9sYTPxqNGSqI@google.com/
>>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>>> ---
>>>    arch/x86/kvm/vmx/vmx.c | 25 +++++++++++++++++++++++--
>>>    arch/x86/kvm/x86.c     |  7 ++++---
>>>    arch/x86/kvm/x86.h     |  1 +
>>>    3 files changed, 28 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>> index 44fb619803b8..8274ef5e89e5 100644
>>> --- a/arch/x86/kvm/vmx/vmx.c
>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>> @@ -309,10 +309,31 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
>>>    	l1tf_vmx_mitigation = l1tf;
>>> -	if (l1tf != VMENTER_L1D_FLUSH_NEVER)
>>> +	/*
>>> +	 * Update static keys and supported arch capabilities according to
>>> +	 * the new mitigation state.
>>> +	 *
>>> +	 * ARCH_CAP_SKIP_VMENTRY_L1DFLUSH is toggled because if we do cache
>>> +	 * flushes for L1 guests on (nested) vmlaunch/vmresume to L2, L1
>>> +	 * guests can skip the flush and if we don't, then L1 guests need
>>> +	 * to do a flush.
>>> +	 *
>>> +	 * Toggling ARCH_CAP_SKIP_VMENTRY_L1DFLUSH may present inconsistent
>>> +	 * model to the guest, e.g., if userspace isn't careful, a VM can
>>> +	 * have vCPUs with different values for ARCH_CAPABILITIES. But
>>> +	 * there is almost no chance to fix the issue. Because, to present
>>> +	 * a consistent model, KVM essentially needs to disallow changing
>>> +	 * the module param after VMs/vCPUs have been created, but that
>>> +	 * would prevent userspace from toggling the param while VMs are
>>> +	 * running, e.g., in response to a new vulnerability.
>>> +	 */
>>> +	if (l1tf != VMENTER_L1D_FLUSH_NEVER) {
>>>    		static_branch_enable(&vmx_l1d_should_flush);
>>> -	else
>>> +		kvm_caps.supported_arch_cap |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>>> +	} else {
>>>    		static_branch_disable(&vmx_l1d_should_flush);
>>> +		kvm_caps.supported_arch_cap &= ~ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>>> +	}
>>>    	if (l1tf == VMENTER_L1D_FLUSH_COND)
>>>    		static_branch_enable(&vmx_l1d_flush_cond);
>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>> index c0778ca39650..2408b5f554b7 100644
>>> --- a/arch/x86/kvm/x86.c
>>> +++ b/arch/x86/kvm/x86.c
>>> @@ -1672,7 +1672,7 @@ static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
>>>    {
>>>    	switch (msr->index) {
>>>    	case MSR_IA32_ARCH_CAPABILITIES:
>>> -		msr->data = kvm_get_arch_capabilities();
>>> +		msr->data = kvm_caps.supported_arch_cap;
>>>    		break;
>>>    	case MSR_IA32_PERF_CAPABILITIES:
>>>    		msr->data = kvm_caps.supported_perf_cap;
>>> @@ -7156,7 +7156,7 @@ static void kvm_probe_msr_to_save(u32 msr_index)
>>>    			return;
>>>    		break;
>>>    	case MSR_IA32_TSX_CTRL:
>>> -		if (!(kvm_get_arch_capabilities() & ARCH_CAP_TSX_CTRL_MSR))
>>> +		if (!(kvm_caps.supported_arch_cap & ARCH_CAP_TSX_CTRL_MSR))
>>>    			return;
>>>    		break;
>>>    	default:
>>> @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>>>    		kvm_caps.max_guest_tsc_khz = max;
>>>    	}
>>>    	kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
>>> +	kvm_caps.supported_arch_cap = kvm_get_arch_capabilities();
>>>    	kvm_init_msr_lists();
>>>    	return 0;
>>> @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>>    	if (r)
>>>    		goto free_guest_fpu;
>>> -	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
>>> +	vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap;
>>>    	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>>>    	kvm_xen_init_vcpu(vcpu);
>>>    	kvm_vcpu_mtrr_init(vcpu);
>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>>> index c544602d07a3..d3e524bcc169 100644
>>> --- a/arch/x86/kvm/x86.h
>>> +++ b/arch/x86/kvm/x86.h
>>> @@ -29,6 +29,7 @@ struct kvm_caps {
>>>    	u64 supported_xcr0;
>>>    	u64 supported_xss;
>>>    	u64 supported_perf_cap;
>>> +	u64 supported_arch_cap;
>>>    };
>>>    void kvm_spurious_fault(void);
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps
  2023-05-24  6:16 ` [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Chao Gao
  2023-05-24  8:14   ` Xiaoyao Li
@ 2023-06-06 16:40   ` Sean Christopherson
  1 sibling, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2023-06-06 16:40 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, x86, xiaoyao.li, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, linux-kernel

On Wed, May 24, 2023, Chao Gao wrote:
> @@ -9532,6 +9532,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
>  		kvm_caps.max_guest_tsc_khz = max;
>  	}
>  	kvm_caps.default_tsc_scaling_ratio = 1ULL << kvm_caps.tsc_scaling_ratio_frac_bits;
> +	kvm_caps.supported_arch_cap = kvm_get_arch_capabilities();
>  	kvm_init_msr_lists();
>  	return 0;
>  
> @@ -11895,7 +11896,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  	if (r)
>  		goto free_guest_fpu;
>  
> -	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
> +	vcpu->arch.arch_capabilities = kvm_caps.supported_arch_cap;
>  	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
>  	kvm_xen_init_vcpu(vcpu);
>  	kvm_vcpu_mtrr_init(vcpu);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index c544602d07a3..d3e524bcc169 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -29,6 +29,7 @@ struct kvm_caps {
>  	u64 supported_xcr0;
>  	u64 supported_xss;
>  	u64 supported_perf_cap;
> +	u64 supported_arch_cap;

Hrm, I take back my earlier vote about using a dynamic snapshot.

"supported" isn't quite right.  KVM always "supports" advertising SKIP_VMENTRY_L1DFLUSH
to the guest.  And KVM really does treat the MSR like a CPUID leaf, in that KVM
doesn't sanity check the value coming in from userspace.  Whether or not that's
a good idea is debatable, but it is was it is.  The value is more like KVM's current
default.

Looking at all the uses of both the default/supported value, and the host MSR,
I think it makes more sense to snapshot the host value than it does to snapshot
and update the default/supported value.  The default value is used only when a
vCPU is created and when userspace does a system-scoped KVM_GET_MSRS, i.e. avoiding
the RDMSR is nice, but making the read super fast isn't necessary, e.g. the overhead
of the boot_cpu_has() and boot_cpu_has_bug() checks is negligible.

And if KVM snapshots the MSR, the other usage of the host value can be cleaned up
too.

I'm leaning towards doing this instead of patches [1/3] and [3/3]:

From: Sean Christopherson <seanjc@google.com>
Date: Tue, 6 Jun 2023 09:20:31 -0700
Subject: [PATCH 1/2] KVM: x86: Snapshot host's MSR_IA32_ARCH_CAPABILITIES

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 22 ++++++----------------
 arch/x86/kvm/x86.c     | 13 +++++++------
 arch/x86/kvm/x86.h     |  1 +
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2d9d155691a7..42d1148f933c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -255,14 +255,9 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 		return 0;
 	}
 
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
-		u64 msr;
-
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
-		if (msr & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
-			l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
-			return 0;
-		}
+	if (host_arch_capabilities & ARCH_CAP_SKIP_VMENTRY_L1DFLUSH) {
+		l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NOT_REQUIRED;
+		return 0;
 	}
 
 	/* If set to auto use the default l1tf mitigation method */
@@ -373,15 +368,10 @@ static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
 
 static void vmx_setup_fb_clear_ctrl(void)
 {
-	u64 msr;
-
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) &&
+	if ((host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
 	    !boot_cpu_has_bug(X86_BUG_MDS) &&
-	    !boot_cpu_has_bug(X86_BUG_TAA)) {
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr);
-		if (msr & ARCH_CAP_FB_CLEAR_CTRL)
-			vmx_fb_clear_ctrl_available = true;
-	}
+	    !boot_cpu_has_bug(X86_BUG_TAA))
+		vmx_fb_clear_ctrl_available = true;
 }
 
 static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7c7be4815eaa..7c2e796fa460 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -237,6 +237,9 @@ EXPORT_SYMBOL_GPL(enable_apicv);
 u64 __read_mostly host_xss;
 EXPORT_SYMBOL_GPL(host_xss);
 
+u64 __read_mostly host_arch_capabilities;
+EXPORT_SYMBOL_GPL(host_arch_capabilities);
+
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
 	STATS_DESC_COUNTER(VM, mmu_shadow_zapped),
@@ -1612,12 +1615,7 @@ static bool kvm_is_immutable_feature_msr(u32 msr)
 
 static u64 kvm_get_arch_capabilities(void)
 {
-	u64 data = 0;
-
-	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) {
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
-		data &= KVM_SUPPORTED_ARCH_CAP;
-	}
+	u64 data = host_arch_capabilities & KVM_SUPPORTED_ARCH_CAP;
 
 	/*
 	 * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
@@ -9492,6 +9490,9 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	kvm_init_pmu_capability(ops->pmu_ops);
 
+	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, host_arch_capabilities);
+
 	r = ops->hardware_setup();
 	if (r != 0)
 		goto out_mmu_exit;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 82e3dafc5453..1e7be1f6ab29 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -323,6 +323,7 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu);
 
 extern u64 host_xcr0;
 extern u64 host_xss;
+extern u64 host_arch_capabilities;
 
 extern struct kvm_caps kvm_caps;
 

base-commit: 02f1b0b736606f9870595b3089d9c124f9da8be9
-- 
2.41.0.rc2.161.g9c6817b8e7-goog

From bd4b41fea5d5fa0347c5ef1ffb73dbdf783b23d0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Tue, 6 Jun 2023 09:25:55 -0700
Subject: [PATCH 2/2] KVM: VMX: Drop unnecessary vmx_fb_clear_ctrl_available
 "cache"

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.c | 17 +++--------------
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 42d1148f933c..17003660138a 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -237,9 +237,6 @@ static const struct {
 #define L1D_CACHE_ORDER 4
 static void *vmx_l1d_flush_pages;
 
-/* Control for disabling CPU Fill buffer clear */
-static bool __read_mostly vmx_fb_clear_ctrl_available;
-
 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
 {
 	struct page *page;
@@ -366,14 +363,6 @@ static int vmentry_l1d_flush_get(char *s, const struct kernel_param *kp)
 	return sprintf(s, "%s\n", vmentry_l1d_param[l1tf_vmx_mitigation].option);
 }
 
-static void vmx_setup_fb_clear_ctrl(void)
-{
-	if ((host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
-	    !boot_cpu_has_bug(X86_BUG_MDS) &&
-	    !boot_cpu_has_bug(X86_BUG_TAA))
-		vmx_fb_clear_ctrl_available = true;
-}
-
 static __always_inline void vmx_disable_fb_clear(struct vcpu_vmx *vmx)
 {
 	u64 msr;
@@ -399,7 +388,9 @@ static __always_inline void vmx_enable_fb_clear(struct vcpu_vmx *vmx)
 
 static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
 {
-	vmx->disable_fb_clear = vmx_fb_clear_ctrl_available;
+	vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
+				!boot_cpu_has_bug(X86_BUG_MDS) &&
+				!boot_cpu_has_bug(X86_BUG_TAA);
 
 	/*
 	 * If guest will not execute VERW, there is no need to set FB_CLEAR_DIS
@@ -8580,8 +8571,6 @@ static int __init vmx_init(void)
 	if (r)
 		goto err_l1d_flush;
 
-	vmx_setup_fb_clear_ctrl();
-
 	for_each_possible_cpu(cpu) {
 		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
 
-- 




^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups
  2023-05-24  6:16 [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Chao Gao
                   ` (2 preceding siblings ...)
  2023-05-24  6:16 ` [PATCH v2 3/3] x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES Chao Gao
@ 2023-06-06 17:26 ` Sean Christopherson
  3 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2023-06-06 17:26 UTC (permalink / raw)
  To: Sean Christopherson, kvm, x86, Chao Gao
  Cc: xiaoyao.li, Alexandre Chartre, Ashok Raj, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Ingo Molnar, Juergen Gross,
	Kim Phillips, linux-kernel, Paolo Bonzini, Pawan Gupta,
	Peter Zijlstra, Thomas Gleixner

On Wed, 24 May 2023 14:16:30 +0800, Chao Gao wrote:
> Do some cleanups about MSR_IA32_ARCH_CAPABILITIES in KVM e.g, fix stale
> comments, elimite open-coded MSR reads, track features that can be exposed to
> guests to avoid computing them runtime. They are found when reading the code.
> 
> No functional change intented.
> 
> Changes from v1 -> v2:
> 1. toggle ARCH_CAP_SKIP_VMENTRY_L1DFLUSH bit when l1tf mitigation state is
>    changed by module param. [Xiaoyao, Sean]
> 2. add more cleanups (patch 2/3)
> 
> [...]

Applied [2/3] to kvm-x86 misc, no need for that one to wait for the dust to
settle on the "supported" snapshot.  Thanks!

[2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush
      https://github.com/kvm-x86/linux/commit/02f1b0b73660

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-06 17:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-24  6:16 [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Chao Gao
2023-05-24  6:16 ` [PATCH v2 1/3] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps Chao Gao
2023-05-24  8:14   ` Xiaoyao Li
2023-05-24  8:32     ` Chao Gao
2023-05-25  2:17       ` Xiaoyao Li
2023-06-06 16:40   ` Sean Christopherson
2023-05-24  6:16 ` [PATCH v2 2/3] KVM: x86: Correct the name for skipping VMENTER l1d flush Chao Gao
2023-05-24  8:25   ` Xiaoyao Li
2023-05-24  6:16 ` [PATCH v2 3/3] x86/cpu, KVM: Use helper function to read MSR_IA32_ARCH_CAPABILITIES Chao Gao
2023-05-24  8:19   ` Xiaoyao Li
2023-06-06 17:26 ` [PATCH v2 0/3] MSR_IA32_ARCH_CAPABILITIES cleanups Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.