Re: [PATCH 20/27] nVMX: Deciding if L0 or L1 should handle an L2 exit

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: "Nadav Har'El" <nyh@il.ibm.com>
Cc: kvm@vger.kernel.org, gleb@redhat.com
Subject: Re: [PATCH 20/27] nVMX: Deciding if L0 or L1 should handle an L2 exit
Date: Wed, 20 Oct 2010 14:13:30 +0200	[thread overview]
Message-ID: <4CBEDCEA.1020507@redhat.com> (raw)
In-Reply-To: <201010171013.o9HADnE1029533@rice.haifa.ibm.com>

  On 10/17/2010 12:13 PM, Nadav Har'El wrote:
> This patch contains the logic of whether an L2 exit should be handled by L0
> and then L2 should be resumed, or whether L1 should be run to handle this
> exit (using the nested_vmx_vmexit() function of the previous patch).
>
> The basic idea is to let L1 handle the exit only if it actually asked to
> trap this sort of event. For example, when L2 exits on a change to CR0,
> we check L1's CR0_GUEST_HOST_MASK to see if L1 expressed interest in any
> bit which changed; If it did, we exit to L1. But if it didn't it means that
> it is we (L0) that wished to trap this event, so we handle it ourselves.
>
> The next two patches add additional logic of what to do when an interrupt or
> exception is injected: Does L0 need to do it, should we exit to L1 to do it,
> or should we resume L2 and keep the exception to be injected later.
>
> We keep a new flag, "nested_run_pending", which can override the decision of
> which should run next, L1 or L2. nested_run_pending=1 means that we *must* run
> L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2
> and therefore expects L2 to be run (and perhaps be injected with an event it
> specified, etc.). Nested_run_pending is especially intended to avoid switching
> to L1 in the injection decision-point described above.
>
>
>   /*
> + * Return 1 if we should exit from L2 to L1 to handle an MSR access access,
> + * rather than handle it ourselves in L0. I.e., check L1's MSR bitmap whether
> + * it expressed interest in the current event (read or write a specific MSR).
> + */
> +static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
> +	struct vmcs_fields *vmcs12, u32 exit_reason)
> +{
> +	u32 msr_index = vcpu->arch.regs[VCPU_REGS_RCX];
> +	struct page *msr_bitmap_page;
> +	void *va;
> +	bool ret;
> +
> +	if (!cpu_has_vmx_msr_bitmap() || !nested_cpu_has_vmx_msr_bitmap(vcpu))
> +		return 1;
> +
> +	msr_bitmap_page = nested_get_page(vcpu, vmcs12->msr_bitmap);
> +	if (!msr_bitmap_page) {
> +		printk(KERN_INFO "%s error in nested_get_page\n", __func__);
> +		return 0;

return leaks the page.

> +	}
> +
> +	va = kmap_atomic(msr_bitmap_page, KM_USER1);
> +	if (exit_reason == EXIT_REASON_MSR_WRITE)
> +		va += 0x800;
> +	if (msr_index>= 0xc0000000) {
> +		msr_index -= 0xc0000000;
> +		va += 0x400;
> +	}
> +	if (msr_index>  0x1fff)
> +		return 0;

return leaks the kmap.

> +	ret = test_bit(msr_index, va);
> +	kunmap_atomic(va, KM_USER1);
> +	return ret;
> +}

How about using kvm_read_guest() instead?  Much simpler and safer.

> +
> +/*
> + * Return 1 if we should exit from L2 to L1 to handle a CR access exit,
> + * rather than handle it ourselves in L0. I.e., check if L1 wanted to
> + * intercept (via guest_host_mask etc.) the current event.
> + */
> +static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
> +	struct vmcs_fields *vmcs12)
> +{
> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> +	int cr = exit_qualification&  15;
> +	int reg = (exit_qualification>>  8)&  15;
> +	unsigned long val = kvm_register_read(vcpu, reg);
> +
> +	switch ((exit_qualification>>  4)&  3) {
> +	case 0: /* mov to cr */
> +		switch (cr) {
> +		case 0:
> +			if (vmcs12->cr0_guest_host_mask&
> +			    (val ^ vmcs12->cr0_read_shadow))
> +				return 1;
> +			break;
> +		case 3:
> +			if (vmcs12->cpu_based_vm_exec_control&
> +			    CPU_BASED_CR3_LOAD_EXITING)
> +				return 1;
> +			break;
> +		case 4:
> +			if (vmcs12->cr4_guest_host_mask&
> +			    (vmcs12->cr4_read_shadow ^ val))
> +				return 1;
> +			break;
> +		case 8:
> +			if (vmcs12->cpu_based_vm_exec_control&
> +			    CPU_BASED_CR8_LOAD_EXITING)
> +				return 1;
> +			/*
> +			 * TODO: missing else if control&  CPU_BASED_TPR_SHADOW
> +			 * then set tpr shadow and if below tpr_threshold, exit.
> +			 */
> +			break;
> +		}
> +		break;
> +	case 2: /* clts */
> +		if (vmcs12->cr0_guest_host_mask&  X86_CR0_TS)
> +			return 1;

If TS is already clear in the guest's cr0_read_shadow, an L2->L1 exit is 
not needed.

> +		break;
> +	case 1: /* mov from cr */
> +		switch (cr) {
> +		case 0:
> +			return 1;

Cannot happen.

> +		case 3:
> +			if (vmcs12->cpu_based_vm_exec_control&
> +			    CPU_BASED_CR3_STORE_EXITING)
> +				return 1;
> +			break;
> +		case 4:
> +			return 1;
> +			break;

Cannot happen.

> +		case 8:
> +			if (vmcs12->cpu_based_vm_exec_control&
> +			    CPU_BASED_CR8_STORE_EXITING)
> +				return 1;

What about TPR threshold?  Or is it not supported yet?

> +			break;
> +		}
> +		break;
> +	case 3: /* lmsw */
> +		/*
> +		 * lmsw can change bits 1..3 of cr0, and only set bit 0 of
> +		 * cr0. Other attempted changes are ignored, with no exit.
> +		 */
> +		if (vmcs12->cr0_guest_host_mask&  0xe&
> +		    (val ^ vmcs12->cr0_read_shadow))
> +			return 1;
> +		if ((vmcs12->cr0_guest_host_mask&  0x1)&&
> +		    !(vmcs12->cr0_read_shadow&  0x1)&&
> +		    (val&  0x1))
> +		    	return 1;
> +		break;
> +	}
> +	return 0;
> +}

I'd prefer to move the intercept checks to kvm_set_cr(), but that can be 
done later (much later).

> +
> +/*
> + * Return 1 if we should exit from L2 to L1 to handle an exit, or 0 if we
> + * should handle it ourselves in L0 (and then continue L2). Only call this
> + * when in nested_mode (L2).
> + */
> +static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> +{
> +	u32 exit_reason = vmcs_read32(VM_EXIT_REASON);
> +	u32 intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	struct vmcs_fields *vmcs12 = get_vmcs12_fields(vcpu);
> +
> +	if (vmx->nested.nested_run_pending)
> +		return 0;
> +
> +	if (unlikely(vmx->fail)) {
> +		printk(KERN_INFO "%s failed vm entry %x\n",
> +		       __func__, vmcs_read32(VM_INSTRUCTION_ERROR));
> +		return 1;
> +	}
> +
> +	switch (exit_reason) {
> +	case EXIT_REASON_EXTERNAL_INTERRUPT:
> +		return 0;
> +	case EXIT_REASON_EXCEPTION_NMI:
> +		if (!is_exception(intr_info))
> +			return 0;
> +		else if (is_page_fault(intr_info)&&  (!enable_ept))
> +			return 0;

We may still find out later that the page fault needs to be intercepted 
by the guest, yes?

> +		return (vmcs12->exception_bitmap&
> +				(1u<<  (intr_info&  INTR_INFO_VECTOR_MASK)));
> +	case EXIT_REASON_EPT_VIOLATION:
> +		return 0;
> +	case EXIT_REASON_INVLPG:
> +		return (vmcs12->cpu_based_vm_exec_control&
> +				CPU_BASED_INVLPG_EXITING);
> +	case EXIT_REASON_MSR_READ:
> +	case EXIT_REASON_MSR_WRITE:
> +		return nested_vmx_exit_handled_msr(vcpu, vmcs12, exit_reason);
> +	case EXIT_REASON_CR_ACCESS:
> +		return nested_vmx_exit_handled_cr(vcpu, vmcs12);
> +	case EXIT_REASON_DR_ACCESS:
> +		return (vmcs12->cpu_based_vm_exec_control&
> +				CPU_BASED_MOV_DR_EXITING);
> +	default:
> +		/*
> +		 * One particularly interesting case that is covered here is an
> +		 * exit caused by L2 running a VMX instruction. L2 is guest
> +		 * mode in L1's world, and according to the VMX spec running a
> +		 * VMX instruction in guest mode should cause an exit to root
> +		 * mode, i.e., to L1. This is why we need to return r=1 for
> +		 * those exit reasons too. This enables further nesting: Like
> +		 * L0 emulates VMX for L1, we now allow L1 to emulate VMX for
> +		 * L2, who will then be able to run L3.
> +		 */
> +		return 1;

What about intr/nmi window?

-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2010-10-20 12:13 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-17 10:03 [PATCH 0/27] nVMX: Nested VMX, v6 Nadav Har'El
2010-10-17 10:04 ` [PATCH 01/27] nVMX: Add "nested" module option to vmx.c Nadav Har'El
2010-10-17 10:04 ` [PATCH 02/27] nVMX: Add VMX and SVM to list of supported cpuid features Nadav Har'El
2010-10-17 10:05 ` [PATCH 03/27] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2010-10-17 12:24   ` Avi Kivity
2010-10-17 12:47     ` Nadav Har'El
2010-10-17 13:07   ` Avi Kivity
2010-10-17 10:05 ` [PATCH 04/27] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2010-10-17 12:31   ` Avi Kivity
2010-10-17 10:06 ` [PATCH 05/27] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2010-10-17 12:34   ` Avi Kivity
2010-10-17 13:18     ` Nadav Har'El
2010-10-17 13:29       ` Avi Kivity
2010-10-17 10:06 ` [PATCH 06/27] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2010-10-17 12:52   ` Avi Kivity
2010-10-17 10:07 ` [PATCH 07/27] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2010-10-17 10:07 ` [PATCH 08/27] nVMX: Hold a vmcs02 for each vmcs12 Nadav Har'El
2010-10-17 13:00   ` Avi Kivity
2010-10-17 10:08 ` [PATCH 09/27] nVMX: Success/failure of VMX instructions Nadav Har'El
2010-10-17 10:08 ` [PATCH 10/27] nVMX: Implement VMCLEAR Nadav Har'El
2010-10-17 13:05   ` Avi Kivity
2010-10-17 13:25     ` Nadav Har'El
2010-10-17 13:27       ` Avi Kivity
2010-10-17 13:37         ` Nadav Har'El
2010-10-17 14:12           ` Avi Kivity
2010-10-17 14:14             ` Gleb Natapov
2010-10-17 10:09 ` [PATCH 11/27] nVMX: Implement VMPTRLD Nadav Har'El
2010-10-17 10:09 ` [PATCH 12/27] nVMX: Implement VMPTRST Nadav Har'El
2010-10-17 10:10 ` [PATCH 13/27] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2010-10-17 13:15   ` Avi Kivity
2010-10-17 10:10 ` [PATCH 14/27] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2010-10-17 13:25   ` Avi Kivity
2010-10-17 10:11 ` [PATCH 15/27] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2010-10-17 14:08   ` Avi Kivity
2011-02-08 12:13     ` Nadav Har'El
2011-02-08 12:27       ` Avi Kivity
2011-02-08 12:36         ` Nadav Har'El
2011-02-08 12:39           ` Avi Kivity
2011-02-08 12:27       ` Avi Kivity
2010-10-17 10:11 ` [PATCH 16/27] nVMX: Move register-syncing to a function Nadav Har'El
2010-10-17 10:12 ` [PATCH 17/27] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2010-10-17 15:06   ` Avi Kivity
2010-10-17 10:12 ` [PATCH 18/27] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2010-10-17 10:13 ` [PATCH 19/27] nVMX: Exiting from L2 to L1 Nadav Har'El
2010-10-17 15:58   ` Avi Kivity
2010-10-17 10:13 ` [PATCH 20/27] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2010-10-20 12:13   ` Avi Kivity [this message]
2010-10-20 14:57     ` Avi Kivity
2010-10-17 10:14 ` [PATCH 21/27] nVMX: Correct handling of interrupt injection Nadav Har'El
2010-10-17 10:14 ` [PATCH 22/27] nVMX: Correct handling of exception injection Nadav Har'El
2010-10-17 10:15 ` [PATCH 23/27] nVMX: Correct handling of idt vectoring info Nadav Har'El
2010-10-17 10:15 ` [PATCH 24/27] nVMX: Handling of CR0.TS and #NM for Lazy FPU loading Nadav Har'El
2010-10-17 10:16 ` [PATCH 25/27] nVMX: Additional TSC-offset handling Nadav Har'El
2010-10-19 19:13   ` Zachary Amsden
2010-10-17 10:16 ` [PATCH 26/27] nVMX: Miscellenous small corrections Nadav Har'El
2010-10-17 10:17 ` [PATCH 27/27] nVMX: Documentation Nadav Har'El

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CBEDCEA.1020507@redhat.com \
    --to=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@il.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).