public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Relax canonical checks on some arch msrs
@ 2024-08-02 15:16 Maxim Levitsky
  2024-08-02 15:16 ` [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs Maxim Levitsky
  2024-08-02 15:16 ` [PATCH v2 2/2] KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE Maxim Levitsky
  0 siblings, 2 replies; 9+ messages in thread
From: Maxim Levitsky @ 2024-08-02 15:16 UTC (permalink / raw)
  To: kvm
  Cc: Sean Christopherson, linux-kernel, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, x86, Thomas Gleixner,
	Dave Hansen, Maxim Levitsky

Recently we came up upon a failure where likely the guest writes
0xff4547ceb1600000 to MSR_KERNEL_GS_BASE and later on, qemu
sets this value via KVM_PUT_MSRS, and is rejected by the
kernel, likely due to not being canonical in 4 level paging.

I did some reverse engineering and to my surprise I found out
that both Intel and AMD have very loose checks in regard to
non canonical addresses written to this and several other msrs,
when the CPU supports 5 level paging.

Patch #1 addresses this, making KVM tolerate this.

Patch #2 is just a fix for a semi theoretical bug, found
while trying to debug the issue.

V2: addressed a very good feedback from Chao Gao. Thanks!

Best regards,
	Maxim Levitsky

Maxim Levitsky (2):
  KVM: x86: relax canonical check for some x86 architectural msrs
  KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and
    MSR_GS_BASE

 arch/x86/kvm/svm/svm.c | 12 ++++++++++++
 arch/x86/kvm/x86.c     | 11 ++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.40.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-02 15:16 [PATCH v2 0/2] Relax canonical checks on some arch msrs Maxim Levitsky
@ 2024-08-02 15:16 ` Maxim Levitsky
  2024-08-02 15:53   ` Sean Christopherson
  2024-08-02 15:16 ` [PATCH v2 2/2] KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE Maxim Levitsky
  1 sibling, 1 reply; 9+ messages in thread
From: Maxim Levitsky @ 2024-08-02 15:16 UTC (permalink / raw)
  To: kvm
  Cc: Sean Christopherson, linux-kernel, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, x86, Thomas Gleixner,
	Dave Hansen, Maxim Levitsky, Chao Gao

Several architectural msrs (e.g MSR_KERNEL_GS_BASE) must contain
a canonical address, and according to Intel PRM, this is enforced
by a #GP canonical check during MSR write.

However as it turns out, the supported address width
used for this canonical check is determined only
by host cpu model:
if CPU *supports* 5 level paging, the width will be 57
regardless of the state of CR4.LA57.

Experemental tests on a Sapphire Rapids CPU and on a Zen4 CPU
confirm this behavior.

In addition to that, the Intel ISA extension manual mentions that this might
be the architectural behavior:

Architecture Instruction Set Extensions and Future Features Programming Reference [1].
Chapter 6.4:

"CANONICALITY CHECKING FOR DATA ADDRESSES WRITTEN TO CONTROL REGISTERS AND
MSRS"

"In Processors that support LAM continue to require the addresses written to
control registers or MSRs to be 57-bit canonical if the processor _supports_
5-level paging or 48-bit canonical if it supports only 4-level paging"

[1]: https://cdrdv2.intel.com/v1/dl/getContent/671368

Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/x86.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a6968eadd418..3582f0bb7644 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1844,7 +1844,16 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
 	case MSR_KERNEL_GS_BASE:
 	case MSR_CSTAR:
 	case MSR_LSTAR:
-		if (is_noncanonical_address(data, vcpu))
+
+		/*
+		 * Both AMD and Intel cpus allow values which
+		 * are canonical in the 5 level paging mode but are not
+		 * canonical in the 4 level paging mode to be written
+		 * to the above MSRs, as long as the host CPU supports
+		 * 5 level paging, regardless of the state of the CR4.LA57.
+		 */
+		if (!__is_canonical_address(data,
+			kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48))
 			return 1;
 		break;
 	case MSR_IA32_SYSENTER_EIP:
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE
  2024-08-02 15:16 [PATCH v2 0/2] Relax canonical checks on some arch msrs Maxim Levitsky
  2024-08-02 15:16 ` [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs Maxim Levitsky
@ 2024-08-02 15:16 ` Maxim Levitsky
  1 sibling, 0 replies; 9+ messages in thread
From: Maxim Levitsky @ 2024-08-02 15:16 UTC (permalink / raw)
  To: kvm
  Cc: Sean Christopherson, linux-kernel, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, x86, Thomas Gleixner,
	Dave Hansen, Maxim Levitsky

If these msrs are read by the emulator (e.g due to 'force emulation' prefix),
SVM code currently fails to extract the corresponding segment bases,
and return them to the emulator.

Fix that.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c58da281f14f..3fc01ba2bd4a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2875,6 +2875,12 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CSTAR:
 		msr_info->data = svm->vmcb01.ptr->save.cstar;
 		break;
+	case MSR_GS_BASE:
+		msr_info->data = svm->vmcb01.ptr->save.gs.base;
+		break;
+	case MSR_FS_BASE:
+		msr_info->data = svm->vmcb01.ptr->save.fs.base;
+		break;
 	case MSR_KERNEL_GS_BASE:
 		msr_info->data = svm->vmcb01.ptr->save.kernel_gs_base;
 		break;
@@ -3100,6 +3106,12 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_CSTAR:
 		svm->vmcb01.ptr->save.cstar = data;
 		break;
+	case MSR_GS_BASE:
+		svm->vmcb01.ptr->save.gs.base = data;
+		break;
+	case MSR_FS_BASE:
+		svm->vmcb01.ptr->save.fs.base = data;
+		break;
 	case MSR_KERNEL_GS_BASE:
 		svm->vmcb01.ptr->save.kernel_gs_base = data;
 		break;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-02 15:16 ` [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs Maxim Levitsky
@ 2024-08-02 15:53   ` Sean Christopherson
  2024-08-05  6:12     ` Chao Gao
  2024-08-05 11:01     ` mlevitsk
  0 siblings, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-08-02 15:53 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, linux-kernel, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Ingo Molnar, x86, Thomas Gleixner, Dave Hansen, Chao Gao

On Fri, Aug 02, 2024, Maxim Levitsky wrote:
> Several architectural msrs (e.g MSR_KERNEL_GS_BASE) must contain
> a canonical address, and according to Intel PRM, this is enforced
> by a #GP canonical check during MSR write.
> 
> However as it turns out, the supported address width
> used for this canonical check is determined only
> by host cpu model:

Please try to wrap consistently and sanely, this is unnecessarily hard to read
because every paragraph manages to wrap at a different column.

> if CPU *supports* 5 level paging, the width will be 57
> regardless of the state of CR4.LA57.
> 
> Experemental tests on a Sapphire Rapids CPU and on a Zen4 CPU
> confirm this behavior.
> 
> In addition to that, the Intel ISA extension manual mentions that this might
> be the architectural behavior:
> 
> Architecture Instruction Set Extensions and Future Features Programming Reference [1].
> Chapter 6.4:
> 
> "CANONICALITY CHECKING FOR DATA ADDRESSES WRITTEN TO CONTROL REGISTERS AND
> MSRS"
> 
> "In Processors that support LAM continue to require the addresses written to
> control registers or MSRs to be 57-bit canonical if the processor _supports_
> 5-level paging or 48-bit canonical if it supports only 4-level paging"
> 
> [1]: https://cdrdv2.intel.com/v1/dl/getContent/671368
> 
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a6968eadd418..3582f0bb7644 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1844,7 +1844,16 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
>  	case MSR_KERNEL_GS_BASE:
>  	case MSR_CSTAR:
>  	case MSR_LSTAR:
> -		if (is_noncanonical_address(data, vcpu))
> +
> +		/*
> +		 * Both AMD and Intel cpus allow values which
> +		 * are canonical in the 5 level paging mode but are not
> +		 * canonical in the 4 level paging mode to be written
> +		 * to the above MSRs, as long as the host CPU supports
> +		 * 5 level paging, regardless of the state of the CR4.LA57.
> +		 */
> +		if (!__is_canonical_address(data,
> +			kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48))

Please align indentation.

Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
what matters is what the guest CPU supports, i.e. this should check guest CPUID.
Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
on a bad load into hardware.  Which means adding a "governed" feature until my
CPUID rework lands.

And I'm pretty sure this fix is incomplete, as nVMX's consistency checks on MSRs
that are loaded via dedicated VMCS fields likely need the same treatment, e.g.
presumably these checks should follow the MSR handling.

	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
	    CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
		return -EINVAL;


	    (CC(is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu)) ||

So I think we probably need a dedicated helper for MSRs.

Hmm, and I suspect these are wrong too, but in a different way.  Toggling host
LA57 on VM-Exit is legal[*], so logically, KVM should use CR4.LA57 from
vmcs12->host_cr4, not the vCPU's current CR4 value.  Which makes me _really_
curious if Intel CPUs actually get that right.

	if (CC(is_noncanonical_address(vmcs12->host_fs_base, vcpu)) ||
	    CC(is_noncanonical_address(vmcs12->host_gs_base, vcpu)) ||
	    CC(is_noncanonical_address(vmcs12->host_gdtr_base, vcpu)) ||
	    CC(is_noncanonical_address(vmcs12->host_idtr_base, vcpu)) ||
	    CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)) ||
	    CC(is_noncanonical_address(vmcs12->host_rip, vcpu)))
		return -EINVAL;

[*] https://lore.kernel.org/all/20210622211124.3698119-1-seanjc@google.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-02 15:53   ` Sean Christopherson
@ 2024-08-05  6:12     ` Chao Gao
  2024-08-05 11:01     ` mlevitsk
  1 sibling, 0 replies; 9+ messages in thread
From: Chao Gao @ 2024-08-05  6:12 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Maxim Levitsky, kvm, linux-kernel, Borislav Petkov,
	H. Peter Anvin, Paolo Bonzini, Ingo Molnar, x86, Thomas Gleixner,
	Dave Hansen

>Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
>what matters is what the guest CPU supports, i.e. this should check guest CPUID.
>Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
>on a bad load into hardware.  Which means adding a "governed" feature until my
>CPUID rework lands.
>
>And I'm pretty sure this fix is incomplete, as nVMX's consistency checks on MSRs
>that are loaded via dedicated VMCS fields likely need the same treatment, e.g.
>presumably these checks should follow the MSR handling.
>
>	if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
>	    CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
>		return -EINVAL;
>
>
>	    (CC(is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu)) ||
>
>So I think we probably need a dedicated helper for MSRs.

+1

>
>Hmm, and I suspect these are wrong too, but in a different way.  Toggling host

Actually they are wrong in the _same_ way because ...

>LA57 on VM-Exit is legal[*], so logically, KVM should use CR4.LA57 from
>vmcs12->host_cr4, not the vCPU's current CR4 value.  Which makes me _really_
>curious if Intel CPUs actually get that right.

... according to Section 3.1 in 5-level paging whitepaper[1], on VM entry, the
canonicality checking about the host/guest state (except guest RIP) is also
based on whether the CPU supports 5-level paging, i.e., host or guest CR4.LA57
isn't checked.

[1]: https://cdrdv2.intel.com/v1/dl/getContent/671442?fileName=5-level-paging-white-paper.pdf

>
>	if (CC(is_noncanonical_address(vmcs12->host_fs_base, vcpu)) ||
>	    CC(is_noncanonical_address(vmcs12->host_gs_base, vcpu)) ||
>	    CC(is_noncanonical_address(vmcs12->host_gdtr_base, vcpu)) ||
>	    CC(is_noncanonical_address(vmcs12->host_idtr_base, vcpu)) ||
>	    CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)) ||
>	    CC(is_noncanonical_address(vmcs12->host_rip, vcpu)))

not sure how host_rip should be handled because it isn't documented in that
spec.

>		return -EINVAL;
>
>[*] https://lore.kernel.org/all/20210622211124.3698119-1-seanjc@google.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-02 15:53   ` Sean Christopherson
  2024-08-05  6:12     ` Chao Gao
@ 2024-08-05 11:01     ` mlevitsk
  2024-08-05 16:39       ` Sean Christopherson
  1 sibling, 1 reply; 9+ messages in thread
From: mlevitsk @ 2024-08-05 11:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Ingo Molnar, x86, Thomas Gleixner, Dave Hansen, Chao Gao

У пт, 2024-08-02 у 08:53 -0700, Sean Christopherson пише:
> > > > On Fri, Aug 02, 2024, Maxim Levitsky wrote:
> > > > > > > > Several architectural msrs (e.g MSR_KERNEL_GS_BASE) must contain
> > > > > > > > a canonical address, and according to Intel PRM, this is enforced
> > > > > > > > by a #GP canonical check during MSR write.
> > > > > > > > 
> > > > > > > > However as it turns out, the supported address width
> > > > > > > > used for this canonical check is determined only
> > > > > > > > by host cpu model:
> > > > 
> > > > Please try to wrap consistently and sanely, this is unnecessarily hard to read
> > > > because every paragraph manages to wrap at a different column.
I'll take a note.

> > > > 
> > > > > > > > if CPU *supports* 5 level paging, the width will be 57
> > > > > > > > regardless of the state of CR4.LA57.
> > > > > > > > 
> > > > > > > > Experemental tests on a Sapphire Rapids CPU and on a Zen4 CPU
> > > > > > > > confirm this behavior.
> > > > > > > > 
> > > > > > > > In addition to that, the Intel ISA extension manual mentions that this might
> > > > > > > > be the architectural behavior:
> > > > > > > > 
> > > > > > > > Architecture Instruction Set Extensions and Future Features Programming Reference [1].
> > > > > > > > Chapter 6.4:
> > > > > > > > 
> > > > > > > > "CANONICALITY CHECKING FOR DATA ADDRESSES WRITTEN TO CONTROL REGISTERS AND
> > > > > > > > MSRS"
> > > > > > > > 
> > > > > > > > "In Processors that support LAM continue to require the addresses written to
> > > > > > > > control registers or MSRs to be 57-bit canonical if the processor _supports_
> > > > > > > > 5-level paging or 48-bit canonical if it supports only 4-level paging"
> > > > > > > > 
> > > > > > > > [1]: https://cdrdv2.intel.com/v1/dl/getContent/671368
> > > > > > > > 
> > > > > > > > Suggested-by: Chao Gao <chao.gao@intel.com>
> > > > > > > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > > > > ---
> > > > > > > >  arch/x86/kvm/x86.c | 11 ++++++++++-
> > > > > > > >  1 file changed, 10 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > > index a6968eadd418..3582f0bb7644 100644
> > > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > > @@ -1844,7 +1844,16 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> > > > > > > >         case MSR_KERNEL_GS_BASE:
> > > > > > > >         case MSR_CSTAR:
> > > > > > > >         case MSR_LSTAR:
> > > > > > > > -               if (is_noncanonical_address(data, vcpu))
> > > > > > > > +
> > > > > > > > +               /*
> > > > > > > > +                * Both AMD and Intel cpus allow values which
> > > > > > > > +                * are canonical in the 5 level paging mode but are not
> > > > > > > > +                * canonical in the 4 level paging mode to be written
> > > > > > > > +                * to the above MSRs, as long as the host CPU supports
> > > > > > > > +                * 5 level paging, regardless of the state of the CR4.LA57.
> > > > > > > > +                */
> > > > > > > > +               if (!__is_canonical_address(data,
> > > > > > > > +                       kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48))
> > > > 
> > > > Please align indentation.
> > > > 
> > > > Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
> > > > what matters is what the guest CPU supports, i.e. this should check guest CPUID.
> > > > Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
> > > > on a bad load into hardware.  Which means adding a "governed" feature until my
> > > > CPUID rework lands.

Well the problem is that we passthrough these MSRs, and that means that the guest
can modify them at will, and only ucode can prevent it from doing so.

So even if the 5 level paging is disabled in the guest's CPUID, but host supports it,
nothing will prevent the guest to write non canonical value to one of those MSRs, 
and later KVM during migration or just KVM_SET_SREGS will fail.

Thus I used kvm_cpu_cap_has on purpose to make KVM follow the actual ucode
behavior.

> > > > 
> > > > And I'm pretty sure this fix is incomplete, as nVMX's consistency checks on MSRs
> > > > that are loaded via dedicated VMCS fields likely need the same treatment, e.g.
> > > > presumably these checks should follow the MSR handling.
> > > > 
> > > >         if (CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_esp, vcpu)) ||
> > > >             CC(is_noncanonical_address(vmcs12->host_ia32_sysenter_eip, vcpu)))
> > > >                 return -EINVAL;
> > > > 
> > > > 
> > > >             (CC(is_noncanonical_address(vmcs12->guest_bndcfgs & PAGE_MASK, vcpu)) ||
> > > > 
> > > > So I think we probably need a dedicated helper for MSRs.

This is a long story - I didn't want to make this patch explode in size,
especially since it wasn't clear if this is architectural behavior.

Even now I can't be sure that it is architectural behavior - I haven't
found anything in the latest official PRM of either AMD nor Intel.

I'll add a separate patch to fix the nested code path in the next version of the patches.

> > > > 
> > > > Hmm, and I suspect these are wrong too, but in a different way.  Toggling host
> > > > LA57 on VM-Exit is legal[*], so logically, KVM should use CR4.LA57 from
> > > > vmcs12->host_cr4, not the vCPU's current CR4 value.  Which makes me _really_
> > > > curious if Intel CPUs actually get that right.
> > > > 
> > > >         if (CC(is_noncanonical_address(vmcs12->host_fs_base, vcpu)) ||
> > > >             CC(is_noncanonical_address(vmcs12->host_gs_base, vcpu)) ||
> > > >             CC(is_noncanonical_address(vmcs12->host_gdtr_base, vcpu)) ||
> > > >             CC(is_noncanonical_address(vmcs12->host_idtr_base, vcpu)) ||
> > > >             CC(is_noncanonical_address(vmcs12->host_tr_base, vcpu)) ||
> > > >             CC(is_noncanonical_address(vmcs12->host_rip, vcpu)))
> > > >                 return -EINVAL;

This is a good question, I'll check this on bare metal.

Best regards,
 Maxim Levitsky

> > > > 
> > > > [*] https://lore.kernel.org/all/20210622211124.3698119-1-seanjc@google.com
> > > > 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-05 11:01     ` mlevitsk
@ 2024-08-05 16:39       ` Sean Christopherson
  2024-08-05 18:07         ` mlevitsk
  0 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2024-08-05 16:39 UTC (permalink / raw)
  To: mlevitsk
  Cc: kvm, linux-kernel, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Ingo Molnar, x86, Thomas Gleixner, Dave Hansen, Chao Gao

On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> У пт, 2024-08-02 у 08:53 -0700, Sean Christopherson пише:
> > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > > > index a6968eadd418..3582f0bb7644 100644
> > > > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > > > @@ -1844,7 +1844,16 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> > > > > > > > >         case MSR_KERNEL_GS_BASE:
> > > > > > > > >         case MSR_CSTAR:
> > > > > > > > >         case MSR_LSTAR:
> > > > > > > > > -               if (is_noncanonical_address(data, vcpu))
> > > > > > > > > +
> > > > > > > > > +               /*
> > > > > > > > > +                * Both AMD and Intel cpus allow values which
> > > > > > > > > +                * are canonical in the 5 level paging mode but are not
> > > > > > > > > +                * canonical in the 4 level paging mode to be written
> > > > > > > > > +                * to the above MSRs, as long as the host CPU supports
> > > > > > > > > +                * 5 level paging, regardless of the state of the CR4.LA57.
> > > > > > > > > +                */
> > > > > > > > > +               if (!__is_canonical_address(data,
> > > > > > > > > +                       kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48))
> > > > > 
> > > > > Please align indentation.
> > > > > 
> > > > > Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
> > > > > what matters is what the guest CPU supports, i.e. this should check guest CPUID.
> > > > > Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
> > > > > on a bad load into hardware.  Which means adding a "governed" feature until my
> > > > > CPUID rework lands.
> 
> Well the problem is that we passthrough these MSRs, and that means that the guest
> can modify them at will, and only ucode can prevent it from doing so.
> 
> So even if the 5 level paging is disabled in the guest's CPUID, but host supports it,
> nothing will prevent the guest to write non canonical value to one of those MSRs, 
> and later KVM during migration or just KVM_SET_SREGS will fail.
 
Ahh, and now I recall the discussions around the virtualization holes with LA57.

> Thus I used kvm_cpu_cap_has on purpose to make KVM follow the actual ucode
> behavior.

I'm leaning towards having KVM do the right thing when emulation happens to be
triggered.  If KVM checks kvm_cpu_cap_has() instead of guest_cpu_cap_has() (looking
at the future), then KVM will extend the virtualization hole to MSRs that are
never passed through, and also to the nested VMX checks.  Or I suppose we could
add separate helpers for passthrough MSRs vs. non-passthrough, but that seems
like it'd add very little value and a lot of maintenance burden.

Practically speaking, outside of tests, I can't imagine the guest will ever care
if there is inconsistent behavior with respect to loading non-canonical values
into MSRs.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-05 16:39       ` Sean Christopherson
@ 2024-08-05 18:07         ` mlevitsk
  2024-08-05 20:54           ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: mlevitsk @ 2024-08-05 18:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Ingo Molnar, x86, Thomas Gleixner, Dave Hansen, Chao Gao

У пн, 2024-08-05 у 09:39 -0700, Sean Christopherson пише:
> On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> > У пт, 2024-08-02 у 08:53 -0700, Sean Christopherson пише:
> > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > > > > index a6968eadd418..3582f0bb7644 100644
> > > > > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > > > > @@ -1844,7 +1844,16 @@ static int __kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data,
> > > > > > > > > >         case MSR_KERNEL_GS_BASE:
> > > > > > > > > >         case MSR_CSTAR:
> > > > > > > > > >         case MSR_LSTAR:
> > > > > > > > > > -               if (is_noncanonical_address(data, vcpu))
> > > > > > > > > > +
> > > > > > > > > > +               /*
> > > > > > > > > > +                * Both AMD and Intel cpus allow values which
> > > > > > > > > > +                * are canonical in the 5 level paging mode but are not
> > > > > > > > > > +                * canonical in the 4 level paging mode to be written
> > > > > > > > > > +                * to the above MSRs, as long as the host CPU supports
> > > > > > > > > > +                * 5 level paging, regardless of the state of the CR4.LA57.
> > > > > > > > > > +                */
> > > > > > > > > > +               if (!__is_canonical_address(data,
> > > > > > > > > > +                       kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48))
> > > > > > 
> > > > > > Please align indentation.
> > > > > > 
> > > > > > Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
> > > > > > what matters is what the guest CPU supports, i.e. this should check guest CPUID.
> > > > > > Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
> > > > > > on a bad load into hardware.  Which means adding a "governed" feature until my
> > > > > > CPUID rework lands.
> > 
> > Well the problem is that we passthrough these MSRs, and that means that the guest
> > can modify them at will, and only ucode can prevent it from doing so.
> > 
> > So even if the 5 level paging is disabled in the guest's CPUID, but host supports it,
> > nothing will prevent the guest to write non canonical value to one of those MSRs, 
> > and later KVM during migration or just KVM_SET_SREGS will fail.
>  
> Ahh, and now I recall the discussions around the virtualization holes with LA57.
> 
> > Thus I used kvm_cpu_cap_has on purpose to make KVM follow the actual ucode
> > behavior.
> 
> I'm leaning towards having KVM do the right thing when emulation happens to be
> triggered.  If KVM checks kvm_cpu_cap_has() instead of guest_cpu_cap_has() (looking
> at the future), then KVM will extend the virtualization hole to MSRs that are
> never passed through, and also to the nested VMX checks.  Or I suppose we could
> add separate helpers for passthrough MSRs vs. non-passthrough, but that seems
> like it'd add very little value and a lot of maintenance burden.
> 
> Practically speaking, outside of tests, I can't imagine the guest will ever care
> if there is inconsistent behavior with respect to loading non-canonical values
> into MSRs.
> 

Hi,

If we weren't allowing the guest (and even nested guest assuming that L1 hypervisor allows it) to write
these MSRs directly, I would have agreed with you, but we do allow this.

This means that for example a L2, even a malicious L2, can on purpose write non canonical value to one of these
MSRs, and later on, KVM could kill the L0 due to canonical check.

Or L1 (not Linux, because it only lets canonical GS_BASE/FS_BASE), allow the untrusted userspace to 
write any value to say GS_BASE, thus allowing
malicious L1 userspace to crash L1 (also a security violation).

IMHO if we really want to do it right, we need to disable pass-though of these MSRs if ucode check is more lax than
our check, that is if L1 is running without 5 level paging enabled but L0 does have it supported.

I don't know if this configuration is common, and thus how much this will affect performance.

Best regards,
	Maxim Levitsky



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs
  2024-08-05 18:07         ` mlevitsk
@ 2024-08-05 20:54           ` Sean Christopherson
  0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2024-08-05 20:54 UTC (permalink / raw)
  To: mlevitsk
  Cc: kvm, linux-kernel, Borislav Petkov, H. Peter Anvin, Paolo Bonzini,
	Ingo Molnar, x86, Thomas Gleixner, Dave Hansen, Chao Gao

On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> У пн, 2024-08-05 у 09:39 -0700, Sean Christopherson пише:
> > On Mon, Aug 05, 2024, mlevitsk@redhat.com wrote:
> > > У пт, 2024-08-02 у 08:53 -0700, Sean Christopherson пише:
> > > > > > > Checking kvm_cpu_cap_has() is wrong.  What the _host_ supports is irrelevant,
> > > > > > > what matters is what the guest CPU supports, i.e. this should check guest CPUID.
> > > > > > > Ah, but for safety, KVM also needs to check kvm_cpu_cap_has() to prevent faulting
> > > > > > > on a bad load into hardware.  Which means adding a "governed" feature until my
> > > > > > > CPUID rework lands.
> > > 
> > > Well the problem is that we passthrough these MSRs, and that means that the guest
> > > can modify them at will, and only ucode can prevent it from doing so.
> > > 
> > > So even if the 5 level paging is disabled in the guest's CPUID, but host supports it,
> > > nothing will prevent the guest to write non canonical value to one of those MSRs, 
> > > and later KVM during migration or just KVM_SET_SREGS will fail.
> >  
> > Ahh, and now I recall the discussions around the virtualization holes with LA57.
> > 
> > > Thus I used kvm_cpu_cap_has on purpose to make KVM follow the actual ucode
> > > behavior.
> > 
> > I'm leaning towards having KVM do the right thing when emulation happens to be
> > triggered.  If KVM checks kvm_cpu_cap_has() instead of guest_cpu_cap_has() (looking
> > at the future), then KVM will extend the virtualization hole to MSRs that are
> > never passed through, and also to the nested VMX checks.  Or I suppose we could
> > add separate helpers for passthrough MSRs vs. non-passthrough, but that seems
> > like it'd add very little value and a lot of maintenance burden.
> > 
> > Practically speaking, outside of tests, I can't imagine the guest will ever care
> > if there is inconsistent behavior with respect to loading non-canonical values
> > into MSRs.
> > 
> 
> Hi,
> 
> If we weren't allowing the guest (and even nested guest assuming that L1
> hypervisor allows it) to write these MSRs directly, I would have agreed with
> you, but we do allow this.
> 
> This means that for example a L2, even a malicious L2, can on purpose write
> non canonical value to one of these MSRs, and later on, KVM could kill the L0
                                                                             L1?
> due to canonical check.

Ugh, right, if L1 manually saves/restores MSRs and happens to trigger emulation
on WRMSR at the 'wrong" time.

Host userspace save/restore would suffer the same problem.  We could grant host
userspace accesses an exception, but that's rather pointless.

> Or L1 (not Linux, because it only lets canonical GS_BASE/FS_BASE), allow the
> untrusted userspace to write any value to say GS_BASE, thus allowing
> malicious L1 userspace to crash L1 (also a security violation).

FWIW, I don't think this is possible.  WR{FS,GS}BASE and other instructions that
load FS/GS.base honor CR4.LA57, it's only WRMSR that does not.

> IMHO if we really want to do it right, we need to disable pass-though of
> these MSRs if ucode check is more lax than our check, that is if L1 is
> running without 5 level paging enabled but L0 does have it supported.
>
> I don't know if this configuration is common, and thus how much this will
> affect performance.

MSR_FS_BASE and SR_KERNEL_GS_BASE are hot spots when WR{FS,GS}BASE are unsupported,
or if the guest kernels doesn't utilize those instructions.

All in all, I agree it's not worth trying to plug the virtualization hole for MSRs,
especially since mimicking hardware yields much simpler code overall.  E.g. add
a dedicated MSR helper, and have that one check kvm_cpu_cap_has(), including in
VM-Entry flows, but keep the existing is_noncanonical_address() for all non-WRMSR
path.

Something like this?

static inline bool is_noncanonical_msr_value(u64 la)
{
	u8 virt_addr_bits = kvm_cpu_cap_has(X86_FEATURE_LA57) ? 57 : 48;

	return !__is_canonical_address(la, virt_addr_bits);
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-08-05 20:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02 15:16 [PATCH v2 0/2] Relax canonical checks on some arch msrs Maxim Levitsky
2024-08-02 15:16 ` [PATCH v2 1/2] KVM: x86: relax canonical check for some x86 architectural msrs Maxim Levitsky
2024-08-02 15:53   ` Sean Christopherson
2024-08-05  6:12     ` Chao Gao
2024-08-05 11:01     ` mlevitsk
2024-08-05 16:39       ` Sean Christopherson
2024-08-05 18:07         ` mlevitsk
2024-08-05 20:54           ` Sean Christopherson
2024-08-02 15:16 ` [PATCH v2 2/2] KVM: SVM: fix emulation of msr reads/writes of MSR_FS_BASE and MSR_GS_BASE Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox