Re: [PATCH v3] KVM: nVMX: Don't leak L1 MMIO regions to L2

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Jim Mattson <jmattson@google.com>
Cc: kvm@vger.kernel.org, Liran Alon <liran.alon@oracle.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dan Cross <dcross@google.com>, Peter Shier <pshier@google.com>
Subject: Re: [PATCH v3] KVM: nVMX: Don't leak L1 MMIO regions to L2
Date: Mon, 14 Oct 2019 10:59:36 -0700	[thread overview]
Message-ID: <20191014175936.GD22962@linux.intel.com> (raw)
In-Reply-To: <20191010232819.135894-1-jmattson@google.com>

On Thu, Oct 10, 2019 at 04:28:19PM -0700, Jim Mattson wrote:
> If the "virtualize APIC accesses" VM-execution control is set in the
> VMCS, the APIC virtualization hardware is triggered when a page walk
> in VMX non-root mode terminates at a PTE wherein the address of the 4k
> page frame matches the APIC-access address specified in the VMCS. On
> hardware, the APIC-access address may be any valid 4k-aligned physical
> address.
> 
> KVM's nVMX implementation enforces the additional constraint that the
> APIC-access address specified in the vmcs12 must be backed by
> cacheable memory in L1. If not, L0 will simply clear the "virtualize
> APIC accesses" VM-execution control in the vmcs02.
> 
> The problem with this approach is that the L1 guest has arranged the
> vmcs12 EPT tables--or shadow page tables, if the "enable EPT"
> VM-execution control is clear in the vmcs12--so that the L2 guest
> physical address(es)--or L2 guest linear address(es)--that reference
> the L2 APIC map to the APIC-access address specified in the
> vmcs12. Without the "virtualize APIC accesses" VM-execution control in
> the vmcs02, the APIC accesses in the L2 guest will directly access the
> APIC-access page in L1.
> 
> When there is no mapping whatsoever for the APIC-access address in L1,
> the L2 VM just loses the intended APIC virtualization. However, when
> the APIC-access address is mapped to an MMIO region in L1, the L2
> guest gets direct access to the L1 MMIO device. For example, if the
> APIC-access address specified in the vmcs12 is 0xfee00000, then L2
> gets direct access to L1's APIC.
> 
> Since this vmcs12 configuration is something that KVM cannot
> faithfully emulate, the appropriate response is to exit to userspace
> with KVM_INTERNAL_ERROR_EMULATION.
> 
> Fixes: fe3ef05c7572 ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12")
> Reported-by: Dan Cross <dcross@google.com>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> Change-Id: Ib501fe63266c1d831ce4d1d55e8688bc34a6844a
> ---
> v2 -> v3: Added default case to new switch in nested_vmx_run
> v1 -> v2: Added enum enter_vmx_status
> 
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/kvm/vmx/nested.c       | 68 +++++++++++++++++++--------------
>  arch/x86/kvm/vmx/nested.h       | 13 ++++++-
>  arch/x86/kvm/x86.c              |  8 +++-
>  4 files changed, 59 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5d8056ff7390..0dee68560437 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1186,7 +1186,7 @@ struct kvm_x86_ops {
>  	int (*set_nested_state)(struct kvm_vcpu *vcpu,
>  				struct kvm_nested_state __user *user_kvm_nested_state,
>  				struct kvm_nested_state *kvm_state);
> -	void (*get_vmcs12_pages)(struct kvm_vcpu *vcpu);
> +	bool (*get_vmcs12_pages)(struct kvm_vcpu *vcpu);
>  
>  	int (*smi_allowed)(struct kvm_vcpu *vcpu);
>  	int (*pre_enter_smm)(struct kvm_vcpu *vcpu, char *smstate);
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 5e231da00310..88b2f08aaaae 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2927,7 +2927,7 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu)
>  static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
>  						 struct vmcs12 *vmcs12);
>  
> -static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
> +static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
>  {
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -2947,19 +2947,18 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
>  			vmx->nested.apic_access_page = NULL;
>  		}
>  		page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->apic_access_addr);
> -		/*
> -		 * If translation failed, no matter: This feature asks
> -		 * to exit when accessing the given address, and if it
> -		 * can never be accessed, this feature won't do
> -		 * anything anyway.
> -		 */
>  		if (!is_error_page(page)) {
>  			vmx->nested.apic_access_page = page;
>  			hpa = page_to_phys(vmx->nested.apic_access_page);
>  			vmcs_write64(APIC_ACCESS_ADDR, hpa);
>  		} else {
> -			secondary_exec_controls_clearbit(vmx,
> -				SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES);
> +			pr_debug_ratelimited("%s: non-cacheable APIC-access address in vmcs12\n",
> +					     __func__);

Hmm, "non-cacheable" is confusing, especially in the context of the APIC,
which needs to be mapped "uncacheable".  Maybe just "invalid"?

> +			vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> +			vcpu->run->internal.suberror =
> +				KVM_INTERNAL_ERROR_EMULATION;
> +			vcpu->run->internal.ndata = 0;
> +			return false;
>  		}
>  	}
>  
> @@ -3004,6 +3003,7 @@ static void nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
>  		exec_controls_setbit(vmx, CPU_BASED_USE_MSR_BITMAPS);
>  	else
>  		exec_controls_clearbit(vmx, CPU_BASED_USE_MSR_BITMAPS);
> +	return true;
>  }
>  
>  /*
> @@ -3042,13 +3042,15 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
>  /*
>   * If from_vmentry is false, this is being called from state restore (either RSM
>   * or KVM_SET_NESTED_STATE).  Otherwise it's called from vmlaunch/vmresume.
> -+ *
> -+ * Returns:
> -+ *   0 - success, i.e. proceed with actual VMEnter
> -+ *   1 - consistency check VMExit
> -+ *  -1 - consistency check VMFail
> + *
> + * Returns:
> + *	ENTER_VMX_SUCCESS: Successfully entered VMX non-root mode

"Enter VMX" usually refers to VMXON, e.g. the title of VMXON in the SDM is
"Enter VMX Operation".

Maybe NVMX_ENTER_NON_ROOT_?

> + *	ENTER_VMX_VMFAIL:  Consistency check VMFail
> + *	ENTER_VMX_VMEXIT:  Consistency check VMExit
> + *	ENTER_VMX_ERROR:   KVM internal error

Probably need to more explicit than VMX_ERROR, e.g. all of the VM-Fail
defines are prefixed with VMXERR_##.

May ENTER_VMX_KVM_ERROR?  (Or NVMX_ENTER_NON_ROOT_KVM_ERROR).

>   */
> -int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
> +enum enter_vmx_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> +						     bool from_vmentry)
>  {
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> @@ -3091,11 +3093,12 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	prepare_vmcs02_early(vmx, vmcs12);
>  
>  	if (from_vmentry) {
> -		nested_get_vmcs12_pages(vcpu);
> +		if (unlikely(!nested_get_vmcs12_pages(vcpu)))
> +			return ENTER_VMX_ERROR;
>  
>  		if (nested_vmx_check_vmentry_hw(vcpu)) {
>  			vmx_switch_vmcs(vcpu, &vmx->vmcs01);
> -			return -1;
> +			return ENTER_VMX_VMFAIL;
>  		}
>  
>  		if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
> @@ -3159,7 +3162,7 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	 * returned as far as L1 is concerned. It will only return (and set
>  	 * the success flag) when L2 exits (see nested_vmx_vmexit()).
>  	 */
> -	return 0;
> +	return ENTER_VMX_SUCCESS;
>  
>  	/*
>  	 * A failed consistency check that leads to a VMExit during L1's
> @@ -3175,14 +3178,14 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>  
>  	if (!from_vmentry)
> -		return 1;
> +		return ENTER_VMX_VMEXIT;
>  
>  	load_vmcs12_host_state(vcpu, vmcs12);
>  	vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
>  	vmcs12->exit_qualification = exit_qual;
>  	if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
>  		vmx->nested.need_vmcs12_to_shadow_sync = true;
> -	return 1;
> +	return ENTER_VMX_VMEXIT;
>  }
>  
>  /*
> @@ -3192,9 +3195,9 @@ int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry)
>  static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  {
>  	struct vmcs12 *vmcs12;
> +	enum enter_vmx_status status;
>  	struct vcpu_vmx *vmx = to_vmx(vcpu);
>  	u32 interrupt_shadow = vmx_get_interrupt_shadow(vcpu);
> -	int ret;
>  
>  	if (!nested_vmx_check_permission(vcpu))
>  		return 1;
> @@ -3254,13 +3257,22 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
>  	 * the nested entry.
>  	 */
>  	vmx->nested.nested_run_pending = 1;
> -	ret = nested_vmx_enter_non_root_mode(vcpu, true);
> -	vmx->nested.nested_run_pending = !ret;
> -	if (ret > 0)
> -		return 1;
> -	else if (ret)
> -		return nested_vmx_failValid(vcpu,
> -			VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> +	status = nested_vmx_enter_non_root_mode(vcpu, true);

What if we use a goto to bury the error handling at the end?  That'd also
provide some flexibility with respect to handling each failure, e.g.:

	vmx->nested.nested_run_pending = 1;
	status = nested_vmx_enter_non_root_mode(vcpu, true);
	if (status != ENTER_VMX_SUCCESS)
		goto vmenter_failed;

	...

	return 1;

vmenter_failed:
	vmx->nested.nested_run_pending = 0;
	if (status == ENTER_VMX_VMFAIL)
		return nested_vmx_failValid(vcpu,
					    VMXERR_ENTRY_INVALID_CONTROL_FIELD);

	return status == ENTER_VMX_ERROR ? 0 : 1;

or

	vmx->nested.nested_run_pending = 1;
	status = nested_vmx_enter_non_root_mode(vcpu, true);
	if (status != ENTER_VMX_SUCCESS)
		goto vmenter_failed;

	...

	return 1;

vmenter_failed:
	vmx->nested.nested_run_pending = 0;
	if (status == ENTER_VMX_ERROR)
		return 0;
	if (status == ENTER_VMX_VMEXIT)
		return 1;

	WARN_ON_ONCE(status != ENTER_VMX_VMFAIL);
	return nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
	
> +	if (status != ENTER_VMX_SUCCESS) {
> +		vmx->nested.nested_run_pending = 0;
> +		switch (status) {
> +		case ENTER_VMX_VMFAIL:
> +			return nested_vmx_failValid(vcpu,
> +				VMXERR_ENTRY_INVALID_CONTROL_FIELD);
> +		case ENTER_VMX_VMEXIT:
> +			return 1;
> +		case ENTER_VMX_ERROR:
> +			return 0;
> +		default:
> +			WARN_ON_ONCE(1);
> +			break;
> +		}
> +	}
>  
>  	/* Hide L1D cache contents from the nested guest.  */
>  	vmx->vcpu.arch.l1tf_flush_l1d = true;
> diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
> index 187d39bf0bf1..07cf5cef86f6 100644
> --- a/arch/x86/kvm/vmx/nested.h
> +++ b/arch/x86/kvm/vmx/nested.h
> @@ -6,6 +6,16 @@
>  #include "vmcs12.h"
>  #include "vmx.h"
>  
> +/*
> + * Status returned by nested_vmx_enter_non_root_mode():
> + */
> +enum enter_vmx_status {
> +	ENTER_VMX_SUCCESS,	/* Successfully entered VMX non-root mode */
> +	ENTER_VMX_VMFAIL,	/* Consistency check VMFail */
> +	ENTER_VMX_VMEXIT,	/* Consistency check VMExit */
> +	ENTER_VMX_ERROR,	/* KVM internal error */
> +};
> +
>  void vmx_leave_nested(struct kvm_vcpu *vcpu);
>  void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps,
>  				bool apicv);
> @@ -13,7 +23,8 @@ void nested_vmx_hardware_unsetup(void);
>  __init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
>  void nested_vmx_vcpu_setup(void);
>  void nested_vmx_free_vcpu(struct kvm_vcpu *vcpu);
> -int nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, bool from_vmentry);
> +enum enter_vmx_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> +						     bool from_vmentry);
>  bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason);
>  void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>  		       u32 exit_intr_info, unsigned long exit_qualification);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f26f8be4e621..627fd7ff3a28 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7937,8 +7937,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  	bool req_immediate_exit = false;
>  
>  	if (kvm_request_pending(vcpu)) {
> -		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu))
> -			kvm_x86_ops->get_vmcs12_pages(vcpu);
> +		if (kvm_check_request(KVM_REQ_GET_VMCS12_PAGES, vcpu)) {
> +			if (unlikely(!kvm_x86_ops->get_vmcs12_pages(vcpu))) {
> +				r = 0;
> +				goto out;
> +			}
> +		}
>  		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
>  			kvm_mmu_unload(vcpu);
>  		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> -- 
> 2.23.0.581.g78d2f28ef7-goog
>

next prev parent reply	other threads:[~2019-10-14 17:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-10 23:28 [PATCH v3] KVM: nVMX: Don't leak L1 MMIO regions to L2 Jim Mattson
2019-10-14 17:59 ` Sean Christopherson [this message]
2019-10-14 18:50   ` Jim Mattson
2019-10-14 19:15     ` Sean Christopherson
2019-10-14 19:37       ` Jim Mattson
2019-10-14 19:52         ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191014175936.GD22962@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=dcross@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=liran.alon@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=pshier@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox