Re: [PATCH 19/27] nVMX: Exiting from L2 to L1

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: "Nadav Har'El" <nyh@il.ibm.com>
Cc: kvm@vger.kernel.org, gleb@redhat.com
Subject: Re: [PATCH 19/27] nVMX: Exiting from L2 to L1
Date: Sun, 17 Oct 2010 17:58:01 +0200	[thread overview]
Message-ID: <4CBB1D09.8090406@redhat.com> (raw)
In-Reply-To: <201010171013.o9HADI1v029526@rice.haifa.ibm.com>

  On 10/17/2010 12:13 PM, Nadav Har'El wrote:
> This patch implements nested_vmx_vmexit(), called when the nested L2 guest
> exits and we want to run its L1 parent and let it handle this exit.
>
> Note that this will not necessarily be called on every L2 exit. L0 may decide
> to handle a particular exit on its own, without L1's involvement; In that
> case, L0 will handle the exit, and resume running L2, without running L1 and
> without calling nested_vmx_vmexit(). The logic for deciding whether to handle
> a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(),
> will appear in the next patch.
>
> Signed-off-by: Nadav Har'El<nyh@il.ibm.com>
> ---
>   arch/x86/kvm/vmx.c |  235 +++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 235 insertions(+)
>
> --- .before/arch/x86/kvm/vmx.c	2010-10-17 11:52:02.000000000 +0200
> +++ .after/arch/x86/kvm/vmx.c	2010-10-17 11:52:02.000000000 +0200
> @@ -5085,6 +5085,8 @@ static void __vmx_complete_interrupts(st
>
>   static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>   {
> +	if (vmx->nested.nested_mode)
> +		return;
>   	__vmx_complete_interrupts(vmx, vmx->idt_vectoring_info,
>   				  VM_EXIT_INSTRUCTION_LEN,
>   				  IDT_VECTORING_ERROR_CODE);
> @@ -5981,6 +5983,239 @@ static int nested_vmx_run(struct kvm_vcp
>   	return 1;
>   }
>
> +/*
> + * On a nested exit from L2 to L1, vmcs12.guest_cr0 might not be up-to-date
> + * because L2 may have changed some cr0 bits directly (see CRO_GUEST_HOST_MASK)
> + * without L0 trapping the change and updating vmcs12.
> + * This function returns the value we should put in vmcs12.guest_cr0. It's not
> + * enough to just return the current (vmcs02) GUEST_CR0. This may not be the
> + * guest cr0 that L1 thought it was giving its L2 guest - it is possible that
> + * L1 wished to allow its guest to set a cr0 bit directly, but we (L0) asked
> + * to trap this change and instead set just the read shadow. If this is the
> + * case, we need to copy these read-shadow bits back to vmcs12.guest_cr0, where
> + * L1 believes they already are.
> + */
> +static inline unsigned long
> +vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs_fields *vmcs12)
> +{
> +	unsigned long guest_cr0_bits =
> +		vcpu->arch.cr0_guest_owned_bits | vmcs12->cr0_guest_host_mask;
> +	return (vmcs_readl(GUEST_CR0)&  guest_cr0_bits) |
> +		(vmcs_readl(CR0_READ_SHADOW)&  ~guest_cr0_bits);
> +}

I think it's easier to keep cr0_guest_owned_bits up to date, and use 
kvm_read_cr0().  In fact, I think you have to do it so kvm_read_cr0() 
works correctly while in nested guest context.

> +
> +static inline unsigned long
> +vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs_fields *vmcs12)
> +{
> +	unsigned long guest_cr4_bits =
> +		vcpu->arch.cr4_guest_owned_bits | vmcs12->cr4_guest_host_mask;
> +	return (vmcs_readl(GUEST_CR4)&  guest_cr4_bits) |
> +		(vmcs_readl(CR4_READ_SHADOW)&  ~guest_cr4_bits);
> +}

Ditto.

> +
> +/*
> + * prepare_vmcs12 is called when the nested L2 guest exits and we want to
> + * prepare to run its L1 parent. L1 keeps a vmcs for L2 (vmcs12), and this
> + * function updates it to reflect the changes to the guest state while L2 was
> + * running (and perhaps made some exits which were handled directly by L0
> + * without going back to L1), and to reflect the exit reason.
> + * Note that we do not have to copy here all VMCS fields, just those that
> + * could have changed by the L2 guest or the exit - i.e., the guest-state and
> + * exit-information fields only. Other fields are modified by L1 with VMWRITE,
> + * which already writes to vmcs12 directly.
> + */
> +void prepare_vmcs12(struct kvm_vcpu *vcpu)
> +{
> +	struct vmcs_fields *vmcs12 = get_vmcs12_fields(vcpu);
> +
> +	/* update guest state fields: */
> +	vmcs12->guest_cr0 = vmcs12_guest_cr0(vcpu, vmcs12);
> +	vmcs12->guest_cr4 = vmcs12_guest_cr4(vcpu, vmcs12);
> +
> +	vmcs12->guest_dr7 = vmcs_readl(GUEST_DR7);
> +	vmcs12->guest_rsp = vmcs_readl(GUEST_RSP);
> +	vmcs12->guest_rip = vmcs_readl(GUEST_RIP);

kvm_register_read(), kvm_rip_read(), kvm_get_dr().

> +	vmcs12->guest_rflags = vmcs_readl(GUEST_RFLAGS);
> +

> +
> +	if(enable_ept){
> +		vmcs12->guest_physical_address =
> +			vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> +		vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS);
> +	}

Drop please.

> +
> +static int nested_vmx_vmexit(struct kvm_vcpu *vcpu, bool is_interrupt)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +	int efer_offset;
> +	struct vmcs_fields *vmcs01 = vmx->nested.vmcs01_fields;
> +
> +	if (!vmx->nested.nested_mode) {
> +		printk(KERN_INFO "WARNING: %s called but not in nested mode\n",
> +		       __func__);
> +		return 0;
> +	}
> +
> +	sync_cached_regs_to_vmcs(vcpu);

If you use kvm_rip_read() etc, this isn't needed.

> +
> +	prepare_vmcs12(vcpu);
> +	if (is_interrupt)
> +		get_vmcs12_fields(vcpu)->vm_exit_reason =
> +			EXIT_REASON_EXTERNAL_INTERRUPT;

Somewhat strange that a field is updated conditionally.

> +
> +	vmx->nested.current_vmcs12->launched = vmx->launched;
> +	vmx->nested.current_vmcs12->cpu = vcpu->cpu;
> +
> +	vmx->vmcs = vmx->nested.vmcs01;
> +	vcpu->cpu = vmx->nested.l1_state.cpu;
> +	vmx->launched = vmx->nested.l1_state.launched;
> +
> +	vmx->nested.nested_mode = false;
> +
> +	vmx_vcpu_load(vcpu, get_cpu());
> +	put_cpu();

Again need to extend the preempt disable region, probably to before you 
assign vmx->vmcs.

> +
> +	vcpu->arch.efer = vmx->nested.l1_state.efer;
> +	if ((vcpu->arch.efer&  EFER_LMA)&&
> +	    !(vcpu->arch.efer&  EFER_SCE))
> +		vcpu->arch.efer |= EFER_SCE;
> +
> +	efer_offset = __find_msr_index(vmx, MSR_EFER);
> +	if (update_transition_efer(vmx, efer_offset))
> +		wrmsrl(MSR_EFER, vmx->guest_msrs[efer_offset].data);

Use kvm_set_efer().  Just take the existing efer and use "host address 
space vm-exit control bit" for LMA and LME.

You may need to enter_lmode() or exit_lmode() manually.

> +	
> +	/*
> +	 * L2 perhaps switched to real mode and set vmx->rmode, but we're back
> +	 * in L1 and as it is running VMX, it can't be in real mode.
> +	 */
> +	vmx->rmode.vm86_active = 0;
> +

L2 cannot be in real mode, (vmx non-root mode does not support it).  L1 
cannot be in real mode (vmx root operation does not support it).  So no 
need for the assignment.

> +	/*
> +	 * We're running a regular L1 guest again, so we do the regular KVM
> +	 * thing: run vmx_set_cr0 with the cr0 bits the guest thinks it has.
> +	 * vmx_set_cr0 might use slightly different bits on the new guest_cr0
> +	 * it sets, e.g., add TS when !fpu_active.
> +	 * Note that vmx_set_cr0 refers to rmode and efer set above.
> +	 */
> +	vmx_set_cr0(vcpu, guest_readable_cr0(vmcs01));

Should be vmcs12->host_cr0.

> +	/*
> +	 * If we did fpu_activate()/fpu_deactive() during l2's run, we need to
> +	 * apply the same changes to l1's vmcs. We just set cr0 correctly, but
> +	 * now we need to also update cr0_guest_host_mask and exception_bitmap.
> +	 */
> +	vmcs_write32(EXCEPTION_BITMAP,
> +		(vmcs01->exception_bitmap&  ~(1u<<NM_VECTOR)) |
> +			(vcpu->fpu_active ? 0 : (1u<<NM_VECTOR)));

update_exception_bitmap()

> +	vcpu->arch.cr0_guest_owned_bits = (vcpu->fpu_active ? X86_CR0_TS : 0);

&= ~X86_CR0_TS removes the inside information about fpu_active.

> +	vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits);


> +
> +
> +	vmx_set_cr4(vcpu, vmx->nested.l1_state.cr4);
> +
> +	if (enable_ept) {
> +		vcpu->arch.cr3 = vmcs01->guest_cr3;
> +		vmcs_write32(GUEST_CR3, vmcs01->guest_cr3);
> +		vmcs_write64(EPT_POINTER, vmcs01->ept_pointer);
> +		vmcs_write64(GUEST_PDPTR0, vmcs01->guest_pdptr0);
> +		vmcs_write64(GUEST_PDPTR1, vmcs01->guest_pdptr1);
> +		vmcs_write64(GUEST_PDPTR2, vmcs01->guest_pdptr2);
> +		vmcs_write64(GUEST_PDPTR3, vmcs01->guest_pdptr3);

vmexits do not reload PDPTRs from a cache.  Instead, use kvm_set_cr3() 
unconditionally.

> +	} else {
> +		kvm_set_cr3(vcpu, vmx->nested.l1_state.cr3);
> +	}
> +
> +	kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs01->guest_rsp);
> +	kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs01->guest_rip);
> +
> +	kvm_mmu_reset_context(vcpu);
> +	kvm_mmu_load(vcpu);

kvm_mmu_load() is unnecessary, next guest entry will do it automatically 
IIRC.

> +
> +	if (unlikely(vmx->fail)) {
> +		/*
> +		 * When L1 launches L2 and then we (L0) fail to launch L2,
> +		 * we nested_vmx_vmexit back to L1, but now should let it know
> +		 * that the VMLAUNCH failed - with the same error that we
> +		 * got when launching L2.
> +		 */
> +		vmx->fail = 0;
> +		nested_vmx_failValid(vcpu, vmcs_read32(VM_INSTRUCTION_ERROR));
> +	} else
> +		nested_vmx_succeed(vcpu);
> +
> +	return 0;
> +}
> +
>   static struct kvm_x86_ops vmx_x86_ops = {
>   	.cpu_has_kvm_support = cpu_has_kvm_support,
>   	.disabled_by_bios = vmx_disabled_by_bios,


-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2010-10-17 15:58 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-17 10:03 [PATCH 0/27] nVMX: Nested VMX, v6 Nadav Har'El
2010-10-17 10:04 ` [PATCH 01/27] nVMX: Add "nested" module option to vmx.c Nadav Har'El
2010-10-17 10:04 ` [PATCH 02/27] nVMX: Add VMX and SVM to list of supported cpuid features Nadav Har'El
2010-10-17 10:05 ` [PATCH 03/27] nVMX: Implement VMXON and VMXOFF Nadav Har'El
2010-10-17 12:24   ` Avi Kivity
2010-10-17 12:47     ` Nadav Har'El
2010-10-17 13:07   ` Avi Kivity
2010-10-17 10:05 ` [PATCH 04/27] nVMX: Allow setting the VMXE bit in CR4 Nadav Har'El
2010-10-17 12:31   ` Avi Kivity
2010-10-17 10:06 ` [PATCH 05/27] nVMX: Introduce vmcs12: a VMCS structure for L1 Nadav Har'El
2010-10-17 12:34   ` Avi Kivity
2010-10-17 13:18     ` Nadav Har'El
2010-10-17 13:29       ` Avi Kivity
2010-10-17 10:06 ` [PATCH 06/27] nVMX: Implement reading and writing of VMX MSRs Nadav Har'El
2010-10-17 12:52   ` Avi Kivity
2010-10-17 10:07 ` [PATCH 07/27] nVMX: Decoding memory operands of VMX instructions Nadav Har'El
2010-10-17 10:07 ` [PATCH 08/27] nVMX: Hold a vmcs02 for each vmcs12 Nadav Har'El
2010-10-17 13:00   ` Avi Kivity
2010-10-17 10:08 ` [PATCH 09/27] nVMX: Success/failure of VMX instructions Nadav Har'El
2010-10-17 10:08 ` [PATCH 10/27] nVMX: Implement VMCLEAR Nadav Har'El
2010-10-17 13:05   ` Avi Kivity
2010-10-17 13:25     ` Nadav Har'El
2010-10-17 13:27       ` Avi Kivity
2010-10-17 13:37         ` Nadav Har'El
2010-10-17 14:12           ` Avi Kivity
2010-10-17 14:14             ` Gleb Natapov
2010-10-17 10:09 ` [PATCH 11/27] nVMX: Implement VMPTRLD Nadav Har'El
2010-10-17 10:09 ` [PATCH 12/27] nVMX: Implement VMPTRST Nadav Har'El
2010-10-17 10:10 ` [PATCH 13/27] nVMX: Add VMCS fields to the vmcs12 Nadav Har'El
2010-10-17 13:15   ` Avi Kivity
2010-10-17 10:10 ` [PATCH 14/27] nVMX: Implement VMREAD and VMWRITE Nadav Har'El
2010-10-17 13:25   ` Avi Kivity
2010-10-17 10:11 ` [PATCH 15/27] nVMX: Prepare vmcs02 from vmcs01 and vmcs12 Nadav Har'El
2010-10-17 14:08   ` Avi Kivity
2011-02-08 12:13     ` Nadav Har'El
2011-02-08 12:27       ` Avi Kivity
2011-02-08 12:36         ` Nadav Har'El
2011-02-08 12:39           ` Avi Kivity
2011-02-08 12:27       ` Avi Kivity
2010-10-17 10:11 ` [PATCH 16/27] nVMX: Move register-syncing to a function Nadav Har'El
2010-10-17 10:12 ` [PATCH 17/27] nVMX: Implement VMLAUNCH and VMRESUME Nadav Har'El
2010-10-17 15:06   ` Avi Kivity
2010-10-17 10:12 ` [PATCH 18/27] nVMX: No need for handle_vmx_insn function any more Nadav Har'El
2010-10-17 10:13 ` [PATCH 19/27] nVMX: Exiting from L2 to L1 Nadav Har'El
2010-10-17 15:58   ` Avi Kivity [this message]
2010-10-17 10:13 ` [PATCH 20/27] nVMX: Deciding if L0 or L1 should handle an L2 exit Nadav Har'El
2010-10-20 12:13   ` Avi Kivity
2010-10-20 14:57     ` Avi Kivity
2010-10-17 10:14 ` [PATCH 21/27] nVMX: Correct handling of interrupt injection Nadav Har'El
2010-10-17 10:14 ` [PATCH 22/27] nVMX: Correct handling of exception injection Nadav Har'El
2010-10-17 10:15 ` [PATCH 23/27] nVMX: Correct handling of idt vectoring info Nadav Har'El
2010-10-17 10:15 ` [PATCH 24/27] nVMX: Handling of CR0.TS and #NM for Lazy FPU loading Nadav Har'El
2010-10-17 10:16 ` [PATCH 25/27] nVMX: Additional TSC-offset handling Nadav Har'El
2010-10-19 19:13   ` Zachary Amsden
2010-10-17 10:16 ` [PATCH 26/27] nVMX: Miscellenous small corrections Nadav Har'El
2010-10-17 10:17 ` [PATCH 27/27] nVMX: Documentation Nadav Har'El

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CBB1D09.8090406@redhat.com \
    --to=avi@redhat.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=nyh@il.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.