From: Avi Kivity <avi@redhat.com>
To: "Nadav Har'El" <nyh@il.ibm.com>
Cc: kvm@vger.kernel.org, gleb@redhat.com
Subject: Re: [PATCH 20/27] nVMX: Deciding if L0 or L1 should handle an L2 exit
Date: Wed, 20 Oct 2010 14:13:30 +0200 [thread overview]
Message-ID: <4CBEDCEA.1020507@redhat.com> (raw)
In-Reply-To: <201010171013.o9HADnE1029533@rice.haifa.ibm.com>
On 10/17/2010 12:13 PM, Nadav Har'El wrote:
> This patch contains the logic of whether an L2 exit should be handled by L0
> and then L2 should be resumed, or whether L1 should be run to handle this
> exit (using the nested_vmx_vmexit() function of the previous patch).
>
> The basic idea is to let L1 handle the exit only if it actually asked to
> trap this sort of event. For example, when L2 exits on a change to CR0,
> we check L1's CR0_GUEST_HOST_MASK to see if L1 expressed interest in any
> bit which changed; If it did, we exit to L1. But if it didn't it means that
> it is we (L0) that wished to trap this event, so we handle it ourselves.
>
> The next two patches add additional logic of what to do when an interrupt or
> exception is injected: Does L0 need to do it, should we exit to L1 to do it,
> or should we resume L2 and keep the exception to be injected later.
>
> We keep a new flag, "nested_run_pending", which can override the decision of
> which should run next, L1 or L2. nested_run_pending=1 means that we *must* run
> L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2
> and therefore expects L2 to be run (and perhaps be injected with an event it
> specified, etc.). Nested_run_pending is especially intended to avoid switching
> to L1 in the injection decision-point described above.
>
>
> /*
> + * Return 1 if we should exit from L2 to L1 to handle an MSR access access,
> + * rather than handle it ourselves in L0. I.e., check L1's MSR bitmap whether
> + * it expressed interest in the current event (read or write a specific MSR).
> + */
> +static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
> + struct vmcs_fields *vmcs12, u32 exit_reason)
> +{
> + u32 msr_index = vcpu->arch.regs[VCPU_REGS_RCX];
> + struct page *msr_bitmap_page;
> + void *va;
> + bool ret;
> +
> + if (!cpu_has_vmx_msr_bitmap() || !nested_cpu_has_vmx_msr_bitmap(vcpu))
> + return 1;
> +
> + msr_bitmap_page = nested_get_page(vcpu, vmcs12->msr_bitmap);
> + if (!msr_bitmap_page) {
> + printk(KERN_INFO "%s error in nested_get_page\n", __func__);
> + return 0;
return leaks the page.
> + }
> +
> + va = kmap_atomic(msr_bitmap_page, KM_USER1);
> + if (exit_reason == EXIT_REASON_MSR_WRITE)
> + va += 0x800;
> + if (msr_index>= 0xc0000000) {
> + msr_index -= 0xc0000000;
> + va += 0x400;
> + }
> + if (msr_index> 0x1fff)
> + return 0;
return leaks the kmap.
> + ret = test_bit(msr_index, va);
> + kunmap_atomic(va, KM_USER1);
> + return ret;
> +}
How about using kvm_read_guest() instead? Much simpler and safer.
> +
> +/*
> + * Return 1 if we should exit from L2 to L1 to handle a CR access exit,
> + * rather than handle it ourselves in L0. I.e., check if L1 wanted to
> + * intercept (via guest_host_mask etc.) the current event.
> + */
> +static bool nested_vmx_exit_handled_cr(struct kvm_vcpu *vcpu,
> + struct vmcs_fields *vmcs12)
> +{
> + unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
> + int cr = exit_qualification& 15;
> + int reg = (exit_qualification>> 8)& 15;
> + unsigned long val = kvm_register_read(vcpu, reg);
> +
> + switch ((exit_qualification>> 4)& 3) {
> + case 0: /* mov to cr */
> + switch (cr) {
> + case 0:
> + if (vmcs12->cr0_guest_host_mask&
> + (val ^ vmcs12->cr0_read_shadow))
> + return 1;
> + break;
> + case 3:
> + if (vmcs12->cpu_based_vm_exec_control&
> + CPU_BASED_CR3_LOAD_EXITING)
> + return 1;
> + break;
> + case 4:
> + if (vmcs12->cr4_guest_host_mask&
> + (vmcs12->cr4_read_shadow ^ val))
> + return 1;
> + break;
> + case 8:
> + if (vmcs12->cpu_based_vm_exec_control&
> + CPU_BASED_CR8_LOAD_EXITING)
> + return 1;
> + /*
> + * TODO: missing else if control& CPU_BASED_TPR_SHADOW
> + * then set tpr shadow and if below tpr_threshold, exit.
> + */
> + break;
> + }
> + break;
> + case 2: /* clts */
> + if (vmcs12->cr0_guest_host_mask& X86_CR0_TS)
> + return 1;
If TS is already clear in the guest's cr0_read_shadow, an L2->L1 exit is
not needed.
> + break;
> + case 1: /* mov from cr */
> + switch (cr) {
> + case 0:
> + return 1;
Cannot happen.
> + case 3:
> + if (vmcs12->cpu_based_vm_exec_control&
> + CPU_BASED_CR3_STORE_EXITING)
> + return 1;
> + break;
> + case 4:
> + return 1;
> + break;
Cannot happen.
> + case 8:
> + if (vmcs12->cpu_based_vm_exec_control&
> + CPU_BASED_CR8_STORE_EXITING)
> + return 1;
What about TPR threshold? Or is it not supported yet?
> + break;
> + }
> + break;
> + case 3: /* lmsw */
> + /*
> + * lmsw can change bits 1..3 of cr0, and only set bit 0 of
> + * cr0. Other attempted changes are ignored, with no exit.
> + */
> + if (vmcs12->cr0_guest_host_mask& 0xe&
> + (val ^ vmcs12->cr0_read_shadow))
> + return 1;
> + if ((vmcs12->cr0_guest_host_mask& 0x1)&&
> + !(vmcs12->cr0_read_shadow& 0x1)&&
> + (val& 0x1))
> + return 1;
> + break;
> + }
> + return 0;
> +}
I'd prefer to move the intercept checks to kvm_set_cr(), but that can be
done later (much later).
> +
> +/*
> + * Return 1 if we should exit from L2 to L1 to handle an exit, or 0 if we
> + * should handle it ourselves in L0 (and then continue L2). Only call this
> + * when in nested_mode (L2).
> + */
> +static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
> +{
> + u32 exit_reason = vmcs_read32(VM_EXIT_REASON);
> + u32 intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> + struct vcpu_vmx *vmx = to_vmx(vcpu);
> + struct vmcs_fields *vmcs12 = get_vmcs12_fields(vcpu);
> +
> + if (vmx->nested.nested_run_pending)
> + return 0;
> +
> + if (unlikely(vmx->fail)) {
> + printk(KERN_INFO "%s failed vm entry %x\n",
> + __func__, vmcs_read32(VM_INSTRUCTION_ERROR));
> + return 1;
> + }
> +
> + switch (exit_reason) {
> + case EXIT_REASON_EXTERNAL_INTERRUPT:
> + return 0;
> + case EXIT_REASON_EXCEPTION_NMI:
> + if (!is_exception(intr_info))
> + return 0;
> + else if (is_page_fault(intr_info)&& (!enable_ept))
> + return 0;
We may still find out later that the page fault needs to be intercepted
by the guest, yes?
> + return (vmcs12->exception_bitmap&
> + (1u<< (intr_info& INTR_INFO_VECTOR_MASK)));
> + case EXIT_REASON_EPT_VIOLATION:
> + return 0;
> + case EXIT_REASON_INVLPG:
> + return (vmcs12->cpu_based_vm_exec_control&
> + CPU_BASED_INVLPG_EXITING);
> + case EXIT_REASON_MSR_READ:
> + case EXIT_REASON_MSR_WRITE:
> + return nested_vmx_exit_handled_msr(vcpu, vmcs12, exit_reason);
> + case EXIT_REASON_CR_ACCESS:
> + return nested_vmx_exit_handled_cr(vcpu, vmcs12);
> + case EXIT_REASON_DR_ACCESS:
> + return (vmcs12->cpu_based_vm_exec_control&
> + CPU_BASED_MOV_DR_EXITING);
> + default:
> + /*
> + * One particularly interesting case that is covered here is an
> + * exit caused by L2 running a VMX instruction. L2 is guest
> + * mode in L1's world, and according to the VMX spec running a
> + * VMX instruction in guest mode should cause an exit to root
> + * mode, i.e., to L1. This is why we need to return r=1 for
> + * those exit reasons too. This enables further nesting: Like
> + * L0 emulates VMX for L1, we now allow L1 to emulate VMX for
> + * L2, who will then be able to run L3.
> + */
> + return 1;
What about intr/nmi window?
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2010-10-20 12:13 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-17 10:03 [PATCH 0/27] nVMX: Nested VMX, v6 Nadav Har'El
2010-10-17 10:04 ` [PATCH 01/27] nVMX: Add "nested" module option to vmx.c Nadav Har'El
2010-10-17 10:04 ` [PATCH 02/27] nVMX: Add VMX and SVM to list of supported cpuid features Nadav Har'El
2010-10-17 10:05 ` [PATCH 03/27] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2010-10-17 12:24 ` Avi Kivity
2010-10-17 12:47 ` Nadav Har'El
2010-10-17 13:07 ` Avi Kivity
2010-10-17 10:05 ` [PATCH 04/27] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2010-10-17 12:31 ` Avi Kivity
2010-10-17 10:06 ` [PATCH 05/27] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2010-10-17 12:34 ` Avi Kivity
2010-10-17 13:18 ` Nadav Har'El
2010-10-17 13:29 ` Avi Kivity
2010-10-17 10:06 ` [PATCH 06/27] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2010-10-17 12:52 ` Avi Kivity
2010-10-17 10:07 ` [PATCH 07/27] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2010-10-17 10:07 ` [PATCH 08/27] nVMX: Hold a vmcs02 for each vmcs12 Nadav Har'El
2010-10-17 13:00 ` Avi Kivity
2010-10-17 10:08 ` [PATCH 09/27] nVMX: Success/failure of VMX instructions Nadav Har'El
2010-10-17 10:08 ` [PATCH 10/27] nVMX: Implement VMCLEAR Nadav Har'El
2010-10-17 13:05 ` Avi Kivity
2010-10-17 13:25 ` Nadav Har'El
2010-10-17 13:27 ` Avi Kivity
2010-10-17 13:37 ` Nadav Har'El
2010-10-17 14:12 ` Avi Kivity
2010-10-17 14:14 ` Gleb Natapov
2010-10-17 10:09 ` [PATCH 11/27] nVMX: Implement VMPTRLD Nadav Har'El
2010-10-17 10:09 ` [PATCH 12/27] nVMX: Implement VMPTRST Nadav Har'El
2010-10-17 10:10 ` [PATCH 13/27] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2010-10-17 13:15 ` Avi Kivity
2010-10-17 10:10 ` [PATCH 14/27] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2010-10-17 13:25 ` Avi Kivity
2010-10-17 10:11 ` [PATCH 15/27] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2010-10-17 14:08 ` Avi Kivity
2011-02-08 12:13 ` Nadav Har'El
2011-02-08 12:27 ` Avi Kivity
2011-02-08 12:36 ` Nadav Har'El
2011-02-08 12:39 ` Avi Kivity
2011-02-08 12:27 ` Avi Kivity
2010-10-17 10:11 ` [PATCH 16/27] nVMX: Move register-syncing to a function Nadav Har'El
2010-10-17 10:12 ` [PATCH 17/27] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2010-10-17 15:06 ` Avi Kivity
2010-10-17 10:12 ` [PATCH 18/27] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2010-10-17 10:13 ` [PATCH 19/27] nVMX: Exiting from L2 to L1 Nadav Har'El
2010-10-17 15:58 ` Avi Kivity
2010-10-17 10:13 ` [PATCH 20/27] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2010-10-20 12:13 ` Avi Kivity [this message]
2010-10-20 14:57 ` Avi Kivity
2010-10-17 10:14 ` [PATCH 21/27] nVMX: Correct handling of interrupt injection Nadav Har'El
2010-10-17 10:14 ` [PATCH 22/27] nVMX: Correct handling of exception injection Nadav Har'El
2010-10-17 10:15 ` [PATCH 23/27] nVMX: Correct handling of idt vectoring info Nadav Har'El
2010-10-17 10:15 ` [PATCH 24/27] nVMX: Handling of CR0.TS and #NM for Lazy FPU loading Nadav Har'El
2010-10-17 10:16 ` [PATCH 25/27] nVMX: Additional TSC-offset handling Nadav Har'El
2010-10-19 19:13 ` Zachary Amsden
2010-10-17 10:16 ` [PATCH 26/27] nVMX: Miscellenous small corrections Nadav Har'El
2010-10-17 10:17 ` [PATCH 27/27] nVMX: Documentation Nadav Har'El
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBEDCEA.1020507@redhat.com \
--to=avi@redhat.com \
--cc=gleb@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=nyh@il.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).